Scaling Soil Survey

Background and Justification

The National Cooperative Soil Survey (NCSS) produces one of the most comprehensive, high resolution environmental datasets in the U.S., with generalized soil resource information for the entire nation and detailed data for nearly 80% of the country (Soil Survey Staff, 2006a). Recent developments in computing power coupled with the increasing availability of geographic information systems (GIS) on the desktop has made possible new and exciting applications of digital soil survey products. In many instances, however, the scale at which soil survey databases are produced and the use of "representative" values does not meet stakeholder needs (National Research Council, 1993; Brasher and Benham, 1996). Specifically, users have noted that there is a need for basic statistical measures (mean, variance, confidence intervals, etc.) on reported soil properties (Brown, 1988; Brown and Huddleston, 1991), along with some measure of spatial variability (Brubaker and Hallmark, 2001; Burrough, 2001). Thus digital soil survey data are often used incorrectly, miss-interpreted, or not used at all. Clearly, new products are needed which can meet the modern demands placed on soil survey. We intend to develop a set of methods for creating these products, by scaling (up and down) existing soil survey information, with an assessment of uncertainty. Evaluation of digital proxies for soil forming factors, historic pedon data (3000+ points), statistical models, and landscape-segmentation approaches will be used.

The NCSS has developed two digital soil survey products, representing two levels of generalization. The most detailed soil survey information, Soil Survey Geographic Database (SSURGO), is compiled at a scale of 1:12,000 to 1:24,000 from point observation and mental models of soil formation (Soil Survey Staff, 2006a). Generalized soils information is provided in the U.S. General Soil Map (STATSGO); a 1:250,000 scale interpretation of detailed soil survey data, with inferences about natural conditions where soils information is absent (Soil Survey Staff, 2006b). Both products make use of the "map unit" concept, where repeating combinations of soil type (usually expressed as Soil Series name or phase) are identified and delineated as polygons on a map (Soil Survey Staff, 2005).

Despite the obvious utility of digital soil survey databases, there are inherent drawbacks to using these products. The high degree of variability associated with soil bodies prohibits the mapping of soils at a 1:1 scale, hence map units are used to depict patterns of soil distribution. Map unit composition differs depending on the scale of the soil survey and when it was produced. On average, SSURGO map units in California consist of 4 soil types (components), whereas STATSGO map units average 14 components. Map unit composition adds considerable complication along with subjectivity to any analysis because the physical location of components within a map unit delineation is not explicitly documented. Furthermore, STATSGO is often unreliable because it is based on assemblages of dominant soil series from SSURGO. The concepts of SSURGO legends have changed through time and it often encompasses a wide range of geographic settings, particularly in mountainous terrain. SSURGO data are incomplete in several mountainous regions of California, and are missing for many public lands (BLM and National Forests). When SSURGO data are available in these regions, they are often mapped at coarser resolutions compared to agricultural lands; too coarse for many site specific operations such as restoration projects, urban uses, and some modeling efforts. Several workers have emphasized the importance of recognizing both scale-dependant and scale-invariant processes in natural systems when developing environmental models (Beven, 1995; Bloschl, 2001; Schoorl and Veldkamp, 2006), a difficult task with existing fixed-scale soil survey products. Moreover, no clear linkage between the development of SSURGO and STATSGO exists, thus the ability to work across multiple scales is impossible to evaluate.

What would a 25th, 50th, and 75th percentile soil profile look like?

I have mentioned the AQP package in previous entries. One of the functions in this package generates aggregate soil profile data, from a collection of soil profiles that are related by some factor: common lithology, common landscape position, and so on. Typically the mean, or median (50th percentile) is used to generate a new aggregate profile, that is representative of the original collection. Extending this idea, I thought that it would be interesting to generate aggregate profiles that are representative of the 25th and 75th percentiles as well. For the sake of clarity, lets call these three new profiles (25th, 50th, and 75th percentiles) Q25, Q50, and Q75. A 10 cm slicing interval was used as the basis upon which soil properties were aggregated.

Aggregate Profiles

Using the soil profiles P001 - P005 (in the figure above), I generated the aggregate profiles Q25, Q50, and Q75 (see figure above). Horizon names are annotated, along with the clay content of each horizon in parenthesis. Soil colors are based on field-described, dry colors. The colors and clay contents of Q25 represent those values in the 25th percentile of the original collection: i.e. darker colors and lower clay contents. The horizon names assigned to Q25, Q50, and Q75 are based on the horizon name that was most common, within each depth slice. A blue line separates those slices within the aggregated profiles which were derived from at least 50% of the original collection. Slices below 70cm are excluded as they were only derived from P001. In most cases the median (Q50) aggregate profile would be the most useful description of the "central tendency" for a collection of soil profiles. Who knows, maybe it could be used in place of an official series description to describe a map unit. Or, it could be used within the context of a general soils map-- instead of listing all of the series within an aggregate map unit.

So that was interesting. Lets work on a different set of soil profile data, where we have measurements of clay content, CEC, and pH. What happens when we put the Q25, Q50, and Q75 aggregate profiles into a numerical profile classification alongside the original profiles? In other words, what if we numerically describe the pair-wise dissimilarity between all of the profiles in the collection, taking into account soil depth, several soil properties (clay content, CEC, pH), and how those properties vary with depth. The resulting dissimilarity matrix would describe the degree of similarity between any pair of soil profiles. Visualized as a dendrogram, it might look something like the following:

Profile Quantiles

 
where the numerical dissimilarity between profiles is proportional to the height at which they are connected on the "tree". Horizon names illustrate clay content / CEC / pH. Note that the aggregate profiles have not been truncated according to the number of profiles contributing to their calculation-- and how this results in a fairly meaningless set of depth slices near the bottom of these profiles.

Alternatively, we could extract the pair-wise dissimilarity between each profile and the median profile (Q50). If we have adequately and unbiasedly sampled the landscape (a mighty large if!), this metric might tell us something about how much each profile deviates from the central tendency of soils within the sampled region. For the figure above, the normalized percent similarity between all profiles and Q50 is:

  1   2   3   4   5   6   7   8   9  10 Q25 Q75 
 21   4  12  27   0  32  46  36  36  29  36  10 

 
where profile "7" is the most similar to Q50. Interesting. Look for more examples like this within the AQP manual pages and vignette.