Spatial Clustering of Point Data: Spearfish Example
This example uses the 'Partitioning Around Medoids (PAM)' algorithm (Kaufman and Rousseeuw, 2005) to divide a number of point observation into k clusters, based on their spatial attributes only. An extension of this concept can be applied to any type of geographic data, such as terrain attributes.
For a simple comparison of some of the partitioning-style clustering algorithms in R, see this page of demos. For a more complete listing of clustering approaches in R, see the Cluster Task View.
References
- Kaufman, L. & Rousseeuw, P.J. Finding Groups in Data An Introduction to Cluster Analysis Wiley-Interscience, 2005
Export xy coordinates for the bugsites from GRASS See attached file at bottom of page.
# export bugsitesv.out.ascii in=bugsites out=bugsites.xyX
Load this text file into an R session A simple map can be made by plotting the xy coordinates.
# read in ascii file, and assign column namesx <- read.table('bugsites.xy', sep="|")names(x) <- c('easting', 'northing', 'cat')# subset original object, return only x,y colsy <- data.frame(x[,1:2])row.names(y) <- x$cat# simple plot of x,y dataplot(y, pch=3)X
Use the stepFlexclust function to determine an optimal number of hard classes 5 clusters looks like a good start.
# load cluster packagelibrary(cluster)library(flexclust)# figure out a good number of clusters use a range of 2 to 10 clusters, with 20 reps eachs <- stepFlexclust(y, k=2:10, nrep=20)plot(s)X
Perform hard classification (clustering) with the PAM algorithm, and plot the results
# 5 clusters in a good numbery.pam <- pam(y, 5, stand=TRUE)# add the clustering vector back to the original dataframey$cluster <- y.pam$clustering# plot the clusters by colorplot(y$easting, y$northing, col=y$cluster, main="Bugsites Spatial Clustering, 5 classes", cex=0.5, pch=16, xlab="Easting", ylab="Northing")# add the medoids, they are in the same order as the clustering vectorpoints(y.pam$medoids, pch=15, col=1:5, cex=1.25)# connect the original points to the centroids with line segments:for(i in 1:5){segments(x0=y.pam$medoids[i,][1], y0=y.pam$medoids[i,][2], x1=y$easting[y$cluster == i], y1=y$northing[y$cluster ==i], col=i, lty=3)}X
Prepare the data for export to text, and save the clustered data
# add the cluster number to the original dataframey$cluster <- y.pam$clusteringy$orig_cat <- as.numeric(row.names(y))# save as a text file and quitwrite.table(y, file='bugsites.clust', row.names=FALSE)X
Load clustered data into GRASS as a new set of points called 'bclust' For each cluster, extract those points, and compute a convex hull.
# load clustered points into GRASSv.in.ascii in=bugsites.clust out=bclust fs=" " columns='x double, y double, cluster integer, orig_cat integer' skip=1# zoom to the full extent of the Spearfish datasetg.region rast=elevation.dem# there are 5 clusters: show them all, and compute convex hullsfor x in `seq 1 5`do v.extract --o in=bclust where="cluster=$x" out=bclust_$xv.hull --o in=bclust_$x out=bclust_hull_$xd.vect bclust_hull_$x type=boundary fcol=none width=2 col=whited.vect bclust icon=basic/box fcol=black col=black size=6doneX
Attachment:
Software
- General Purpose Programming with Scripting Languages
- LaTeX Tips and Tricks
- PostGIS: Spatially enabled Relational Database Sytem
- PROJ: forward and reverse geographic projections
- GDAL and OGR: geodata conversion and re-projection tools
- R: advanced statistical package
- GRASS GIS: raster, vector, and imagery analysis
- Generic Mapping Tools: high quality map production