A Visualization of Soil Taxonomy Down to the Subgroup Level
Sep 29, 2010 metroadminIt turns out that you can generate a quasi-numerical distance between soil profiles classified according to Soil Taxonomy (or any other hierarchical system) using Gower's generalized dissimilarity metric. For example, taxonomic distances computed from subgroup membership are based on the number of matches at the order, suborder, greatgroup, and subgroup level. This approach allows for the derivation of a quasi-numerical classification system from Soil Taxonomy, but it is severly limited by the fact that each split in the hierarchy is given equal weight. In other words, the quasi-numerical dissimilarity associated with divergence at the soil order level is identical to that associated with divergence at the subgroup level. Clearly this is not ideal.
Gower's generalized dissimilarity metric is conveniently implemented in the cluster package for R. I have posted some related material in the past, but left out some of the details regarding which clustering algorithms produce the most useful dendrograms. Divisive clustering best represents the step-wise splits within the hierarchy of Soil Taxonomy, as expressed in terms of pair-wise dissimilarities. Code examples are below, along with the data used to generate the figure of California subgroups. Discontinuities in figure below are caused by errors in the underlying data, e.g. mis-matches in soil order vs. suborder membership.
Figure: Subgroups from California
Implementation in R (source data is attached at the bottom of the page)
# need theselibrary(ape)library(cluster)library(RColorBrewer)# read-in saved datas <- read.csv('ca-data.csv')# remove bad records that are missing datas <- na.omit(s)# inspect datastr(s)'data.frame': 619 obs. of 4 variables:$ soilorder : Factor w/ 10 levels "Alfisols","Andisols",..: 1 1 1 1 1 1 1 1 1 1 ...$ suborder : Factor w/ 50 levels "Albolls","Andepts",..: 3 3 3 3 3 3 3 3 3 3 ...$ greatgroup: Factor w/ 154 levels "Albaqualfs","Albaquults",..: 1 1 34 52 52 52 106 106 106 114 ...$ subgroup : Factor w/ 573 levels "Abruptic Argiduridic Durixerolls",..: 263 339 366 265 382 461 15 182 420 14 ...head(s)soilorder suborder greatgroup subgroup1 Alfisols Aqualfs Albaqualfs Mollic Albaqualfs2 Alfisols Aqualfs Albaqualfs Typic Albaqualfs3 Alfisols Aqualfs Duraqualfs Typic Duraqualfs4 Alfisols Aqualfs Endoaqualfs Mollic Endoaqualfs5 Alfisols Aqualfs Endoaqualfs Typic Endoaqualfs6 Alfisols Aqualfs Endoaqualfs Udollic Endoaqualfs# perform clustering# use divisive hierarchical clusteringd <- daisy(s)d.h <- as.hclust(diana(d))d.h$labels <- s$subgroupp <- ladderize(as.phylo(d.h))# setup colors for plotting, based on soil ordercols <- brewer.pal(n=length(unique(s$soilorder)), 'Paired')# make the figureplot.phylo(p, cex=0.5, font=1, type='fan', no.margin=TRUE, show.tip.label=FALSE)tiplabels(pch=16, cex=1, col=cols[as.numeric(factor(s$soilorder))])legend('bottomleft', legend=unique(s$soilorder), pt.bg=cols, pt.cex=1.5, pch=21, cex=1.15, bty='n')X
Figure: Subgroups from the lower 48 states
Attachment: ca-data.csv