AQP / soilDB Demo: Dueling Dendrograms
Mar 14, 2012 metroadminUPDATE 2013-04-08: This functionality is now available in the sharpshootRpackage.
Previously, soil profile comparison methods from the aqp package only took into account horizon-level attributes. As of last week the profile_compare() function can now accommodate horizon and site-level attributes. In other words, it is now possible to compute pair-wise dissimilarity between soil profiles using a combination of horizon-level properties (soil texture, pH, color, etc.) and site-level properties (surface slope, vegetation, soil taxonomy, etc.)-- continuous, categorical, or boolean.
An example is presented below which is based on the loafercreek sample data set included with the soilDB package. Be sure to use the latest version of soilDB, 0.5-5 or later. Dissimilarity matrices created from horizon and site+horizon data are compared by placing their respective dendrograms back-to-back. Code from the ape package is used to facilitate dendrogram plotting, manipulation, and indexing. Blue line segments connect matching nodes from each dendrogram. Soil profiles with paralithic contact are marked with orange squares for clarity.
Image: Dueling Dendrograms
Example Code
# setup environmentlibrary(soilDB)library(cluster)library(ape)# load sample dataset from soilDB packagedata(loafercreek)# compare profiles with hz-level data only: need at least 2 variablesd.hz <- profile_compare(loafercreek, vars=c('clay', 'total_frags_pct'), k=0, max_d=100, rescale.result=TRUE)# hz and site level variables: need at least 2 hz-level and 2 site-level# character-class variables should be manually converted to factors firstd.site.hz <- profile_compare(loafercreek, vars=c('clay', 'total_frags_pct', 'paralithic.contact', 'slope'), k=0, max_d=100, rescale.result=TRUE)# verify: both dissimilarity matrices have been rescaled to [0,1]range(d.hz) ; range(d.site.hz)# cluster via divisive hierarchical algorithm# convert to 'phylo' classphylo.hz <- as.phylo(as.hclust(diana(d.hz)))phylo.site.hz <- as.phylo(as.hclust(diana(d.site.hz)))# generate vector of labels for site-level clustering criteria# paralithic contact = 1, missing = 2paralithic.contact.code <- as.integer(loafercreek$paralithic.contact)+1# setup some geometry for connecting lines / arrowsarrow.left <- 0.1arrow.right <- 0.9arrow.length <- 0.1# setup plot layoutlo <- layout(matrix(c(1,3,2), ncol=3), widths=c(1, 1, 1)) # check with: layout.show(lo)# plot hz-level dissimilarities on left-hand sideplot(phylo.hz, cex=0.75, font=1, no.margin=TRUE, direction='rightwards', label.offset=0.015)mtext('Horizon Data', side=3, line=-1.5, font=2, cex=0.75)# save results of phylo environmentp.left <- get("last_plot.phylo", envir = .PlotPhyloEnv)# annotate pedons with paralithic contacttiplabels(pch=15, col=c('orange', NA)[paralithic.contact.code])# plot site+hz-level dissimilarities on right-hand sideplot(phylo.site.hz, cex=0.75, font=1, direction='leftwards', no.margin=TRUE, label.offset=0.015)mtext('Horizon + Site Data', side=3, line=-1.5, font=2, cex=0.75)# save results of phylo environmentp.right <- get("last_plot.phylo", envir = .PlotPhyloEnv)# annotate pedons with paralithic contacttiplabels(pch=15, col=c('orange', NA)[paralithic.contact.code])# get ordering of pedon IDs on each side# left-hand sideleft.new_order <- sapply(1:p.left$Ntip, function(i) which(as.integer(p.left$yy[1:p.left$Ntip]) == i))# right-hand sideright.new_order <- sapply(1:p.right$Ntip, function(i) which(as.integer(p.right$yy[1:p.right$Ntip]) == i))# re-order right-hand side nodes so that they match the ordering of left-side nodesleft.right.converstion.order <- right.new_order[match(left.new_order, right.new_order)]# setup new plot regionpar(mar=c(0,0,0,0))plot(0, 0, type='n', xlim=c(0,1), ylim=range(p.left$yy), axes=FALSE, xpd=FALSE)# plot connecting segmentssegments(x0=arrow.left, y0=p.left$yy[left.new_order], x1=arrow.right,y1=p.right$yy[left.right.converstion.order], col='RoyalBlue')# plot helper arrowsarrows(x0=arrow.left, y0=p.left$yy[left.new_order], x1=arrow.left-arrow.length, y1=p.left$yy[left.new_order], col='RoyalBlue', code=2, length=0.05)arrows(x0=arrow.right, y0=p.right$yy[right.new_order], x1=arrow.right+arrow.length, y1=p.right$yy[right.new_order], col='RoyalBlue', code=2, length=0.05)# legend: top-centerlegend('top', legend=c('Paralithic Contact'), pch=15, col='orange', bty='n')X