Spatial Clustering of Point Data: Spearfish Example

This example uses the 'Partitioning Around Medoids (PAM)' algorithm (Kaufman and Rousseeuw, 2005) to divide a number of point observation into k clusters, based on their spatial attributes only. An extension of this concept can be applied to any type of geographic data, such as terrain attributes.

For a simple comparison of some of the partitioning-style clustering algorithms in R, see this page of demos. For a more complete listing of clustering approaches in R, see the Cluster Task View.

References

  • Kaufman, L. & Rousseeuw, P.J. Finding Groups in Data An Introduction to Cluster Analysis Wiley-Interscience, 2005

Export xy coordinates for the bugsites from GRASS See attached file at bottom of page.

# export bugsites
v.out.ascii in=bugsites out=bugsites.xy

Load this text file into an R session A simple map can be made by plotting the xy coordinates.

# read in ascii file, and assign column names
x <- read.table('bugsites.xy', sep="|")
names(x) <- c('easting', 'northing', 'cat')

# subset original object, return only x,y cols
y <- data.frame(x[,1:2])
row.names(y) <- x$cat

# simple plot of x,y data
plot(y, pch=3)

Use the stepFlexclust function to determine an optimal number of hard classes 5 clusters looks like a good start.

# load cluster package
library(cluster)
library(flexclust)

# figure out a good number of clusters use a range of 2 to 10 clusters, with 20 reps each
s <- stepFlexclust(y, k=2:10, nrep=20)
plot(s)

Perform hard classification (clustering) with the PAM algorithm, and plot the results

# 5 clusters in a good number
y.pam <- pam(y, 5, stand=TRUE)

# add the clustering vector back to the original dataframe
y$cluster <- y.pam$clustering

# plot the clusters by color
plot(y$easting, y$northing, col=y$cluster, main="Bugsites Spatial Clustering, 5 classes", cex=0.5, pch=16, xlab="Easting", ylab="Northing")

# add the medoids, they are in the same order as the clustering vector
points(y.pam$medoids, pch=15, col=1:5, cex=1.25)

# connect the original points to the centroids with line segments:
for(i in 1:5)
{
segments(x0=y.pam$medoids[i,][1], y0=y.pam$medoids[i,][2], x1=y$easting[y$cluster == i], y1=y$northing[y$cluster ==i], col=i, lty=3)
}

Prepare the data for export to text, and save the clustered data

# add the cluster number to the original dataframe
y$cluster <- y.pam$clustering
y$orig_cat <- as.numeric(row.names(y))

# save as a text file and quit
write.table(y, file='bugsites.clust', row.names=FALSE)

Load clustered data into GRASS as a new set of points called 'bclust' For each cluster, extract those points, and compute a convex hull.

# load clustered points into GRASS
v.in.ascii in=bugsites.clust out=bclust fs=" " columns='x double, y double, cluster integer, orig_cat integer' skip=1

# zoom to the full extent of the Spearfish dataset
g.region rast=elevation.dem

# there are 5 clusters: show them all, and compute convex hulls
for x in `seq 1 5`
do v.extract --o in=bclust where="cluster=$x" out=bclust_$x
v.hull --o in=bclust_$x out=bclust_hull_$x
d.vect bclust_hull_$x type=boundary fcol=none width=2 col=white
d.vect bclust icon=basic/box fcol=black col=black size=6
done

Attachment:

bugsites.xy