Spatial Clustering of Point Data: Spearfish Example

This example uses the 'Partitioning Around Medoids (PAM)' algorithm (Kaufman and Rousseeuw, 2005) to divide a number of point observation into k clusters, based on their spatial attributes only. An extension of this concept can be applied to any type of geographic data, such as terrain attributes.

For a simple comparison of some of the partitioning-style clustering algorithms in R, see this page of demos. For a more complete listing of clustering approaches in R, see the Cluster Task View.

References

  • Kaufman, L. & Rousseeuw, P.J. Finding Groups in Data An Introduction to Cluster Analysis Wiley-Interscience, 2005

Export xy coordinates for the bugsites from GRASS See attached file at bottom of page.

# export bugsites
v.out.ascii in=bugsites out=bugsites.xy
X

Load this text file into an R session A simple map can be made by plotting the xy coordinates.

# read in ascii file, and assign column names
x <- read.table('bugsites.xy'sep="|")
names(x) <- c('easting''northing''cat')
# subset original object, return only x,y cols
y <- data.frame(x[,1:2])
row.names(y) <- x$cat
# simple plot of x,y data
plot(ypch=3)
X

Use the stepFlexclust function to determine an optimal number of hard classes 5 clusters looks like a good start.

# load cluster package
library(cluster)
library(flexclust)
# figure out a good number of clusters use a range of 2 to 10 clusters, with 20 reps each
s <- stepFlexclust(yk=2:10nrep=20)
plot(s)
X

Perform hard classification (clustering) with the PAM algorithm, and plot the results

# 5 clusters in a good number
y.pam <- pam(y5stand=TRUE)
# add the clustering vector back to the original dataframe
y$cluster <- y.pam$clustering
# plot the clusters by color
plot(y$eastingy$northingcol=y$clustermain="Bugsites Spatial Clustering, 5 classes"cex=0.5pch=16xlab="Easting"ylab="Northing")
# add the medoids, they are in the same order as the clustering vector
points(y.pam$medoidspch=15col=1:5cex=1.25)
# connect the original points to the centroids with line segments:
for(i in 1:5)
{
segments(x0=y.pam$medoids[i,][1]y0=y.pam$medoids[i,][2]x1=y$easting[y$cluster == i]y1=y$northing[y$cluster ==i]col=ilty=3)
}
X

Prepare the data for export to text, and save the clustered data

# add the cluster number to the original dataframe
y$cluster <- y.pam$clustering
y$orig_cat <- as.numeric(row.names(y))
# save as a text file and quit
write.table(yfile='bugsites.clust'row.names=FALSE)
X

Load clustered data into GRASS as a new set of points called 'bclust' For each cluster, extract those points, and compute a convex hull.

# load clustered points into GRASS
v.in.ascii in=bugsites.clust out=bclust fs=" " columns='x double, y double, cluster integer, orig_cat integer' skip=1
# zoom to the full extent of the Spearfish dataset
g.region rast=elevation.dem
# there are 5 clusters: show them all, and compute convex hulls
for x in `seq 1 5`
do v.extract --o in=bclust where="cluster=$x" out=bclust_$x
v.hull --o in=bclust_$x out=bclust_hull_$x
d.vect bclust_hull_$x type=boundary fcol=none width=2 col=white
d.vect bclust icon=basic/box fcol=black col=black size=6
done
X

Attachment:

bugsites.xy