Spatial Clustering of Point Data: Spearfish Example
This example uses the 'Partitioning Around Medoids (PAM)' algorithm (Kaufman and Rousseeuw, 2005) to divide a number of point observation into k clusters, based on their spatial attributes only. An extension of this concept can be applied to any type of geographic data, such as terrain attributes.
For a simple comparison of some of the partitioning-style clustering algorithms in R, see this page of demos. For a more complete listing of clustering approaches in R, see the Cluster Task View.
References
- Kaufman, L. & Rousseeuw, P.J. Finding Groups in Data An Introduction to Cluster Analysis Wiley-Interscience, 2005
Export xy coordinates for the bugsites from GRASS See attached file at bottom of page.
# export bugsites v.out.ascii in=bugsites out=bugsites.xy
Load this text file into an R session A simple map can be made by plotting the xy coordinates.
# read in ascii file, and assign column names x <- read.table('bugsites.xy', sep="|") names(x) <- c('easting', 'northing', 'cat') # subset original object, return only x,y cols y <- data.frame(x[,1:2]) row.names(y) <- x$cat # simple plot of x,y data plot(y, pch=3)
Use the stepFlexclust function to determine an optimal number of hard classes 5 clusters looks like a good start.
# load cluster package library(cluster) library(flexclust) # figure out a good number of clusters use a range of 2 to 10 clusters, with 20 reps each s <- stepFlexclust(y, k=2:10, nrep=20) plot(s)
Perform hard classification (clustering) with the PAM algorithm, and plot the results
# 5 clusters in a good number y.pam <- pam(y, 5, stand=TRUE) # add the clustering vector back to the original dataframe y$cluster <- y.pam$clustering # plot the clusters by color plot(y$easting, y$northing, col=y$cluster, main="Bugsites Spatial Clustering, 5 classes", cex=0.5, pch=16, xlab="Easting", ylab="Northing") # add the medoids, they are in the same order as the clustering vector points(y.pam$medoids, pch=15, col=1:5, cex=1.25) # connect the original points to the centroids with line segments: for(i in 1:5) { segments(x0=y.pam$medoids[i,][1], y0=y.pam$medoids[i,][2], x1=y$easting[y$cluster == i], y1=y$northing[y$cluster ==i], col=i, lty=3) }
Prepare the data for export to text, and save the clustered data
# add the cluster number to the original dataframe y$cluster <- y.pam$clustering y$orig_cat <- as.numeric(row.names(y)) # save as a text file and quit write.table(y, file='bugsites.clust', row.names=FALSE)
Load clustered data into GRASS as a new set of points called 'bclust' For each cluster, extract those points, and compute a convex hull.
# load clustered points into GRASS v.in.ascii in=bugsites.clust out=bclust fs=" " columns='x double, y double, cluster integer, orig_cat integer' skip=1 # zoom to the full extent of the Spearfish dataset g.region rast=elevation.dem # there are 5 clusters: show them all, and compute convex hulls for x in `seq 1 5` do v.extract --o in=bclust where="cluster=$x" out=bclust_$x v.hull --o in=bclust_$x out=bclust_hull_$x d.vect bclust_hull_$x type=boundary fcol=none width=2 col=white d.vect bclust icon=basic/box fcol=black col=black size=6 done
Attachment:
Software
- General Purpose Programming with Scripting Languages
- LaTeX Tips and Tricks
- PostGIS: Spatially enabled Relational Database Sytem
- PROJ: forward and reverse geographic projections
- GDAL and OGR: geodata conversion and re-projection tools
- R: advanced statistical package
- GRASS GIS: raster, vector, and imagery analysis
- Generic Mapping Tools: high quality map production