Analysis of SSURGO Data in PostGIS: An Overview

 
News
It looks like the NCSS has constructed a web-based, SQL interface to their main database. This new tool was recently highlighted in issue 40 of the NCSS Newsletter and looks like a promising new tool with good documentation. A listing of "web services" offered by the NCSS is presented on this page.

 
Overview
The analysis of SSURGO data is complicated by a series of hierarchical, one-to-many tabular relationships between spatial and attribute data. Understanding the structure of the SSURGO database is critical for correct interpretation and aggregation of soil properties. Before undertaking any SSURGO-based analysis please take some time to become familliar with the details on the product, including intended uses, minimum mapping units sizes, and other important details. In addition, becoming familliar with the SSURGO metadata is a must. The metadata page includes detailed descriptions of table structure, column names and units, as well as important information on the sources of much of the tabular data included in a SSURGO database.

SSURGO data can be downloaded by survey area from The Soil Datamart, with spatial data delivered in shapefile format and attribute data delivered as plain text. Unfortunately, an M.S. Access database template is required to utilize SSURGO attribute data as delivered from the Soil Datamart. For assistance with this procedure please see the NRCS SSURGO page. Using this approach, most analysis of SSURGO must be done survey-by-survey within a GIS environment. For general instructions see this document.

Several online applications allow for simple interaction with the SSURGO database without the need for a GIS or RDBMS knowledge. Some examples include:

 
An Open Source Approach to SSURGO
We have developed an alternate approach to working with SSURGO data using PostGIS, a spatially-enabled version of the popular and open source relational database management system Postgresql, to store all spatial and tabular data for 138 survey areas. This approach facillitates rapid access, analysis, and aggregation of over half a million soil polygons. SQL (structured query language) is used to directly interact with both soil spatial and attribute data. If other forms of spatial data are imported into PostGIS (such as landcover, climatic data, etc.) nearly all spatial and attribute analysis can be done entirely from PostGIS. A series of examples illustrating common tasks will be presented in the following pages.

 
General Approach to Working with SSURGO (also outlined in this document) (more ideas on map unit composition)

  1. Identify the soil properties that are to be included in some analysis
  2. Decide on the appropriate form of aggregation to be used to summarize horizons
    • depth weighted (i.e. to calculate average percent clay through horizons)
    • top 1m (i.e. to summarize shrink-swell capacity of the topsoil)
    • top horizon (i.e. to summarize surface organic carbon)
    • profile sum (i.e. to calculate total water holding capacity)
    • most limiting (i.e. to calculate the most limiting hydraulic conductivity within the soil profile)
  3. Aggregate horizon data: after filtering NULL values and weights
  4. Decide on the appropriate form of aggregation to be used to summarize components
    • component percent weighted (i.e. this will include information from each component, weighted by their estimated percentage of the entire map unit)
    • largest component (i.e. this usually result in the selection of the 'dominant' component, however when there are multuple components with the same estimated percentage ties will occur)
    • major component flag (i.e. components flagged as a 'major component' by the NRCS represent dominant soil types within a map unit. note that there are sometimes multiple components flagged as 'major components'.)
    • dominant condition (i.e. this is usually used for categorical data like hydric conditions. aggregation is performed by selecting the most frequent condition within a map unit)
  5. Aggregation of component data: after filtering NULL values and weights
  6. join the above aggregated data (2x aggregation process for horizon data) to the map unit polygons
  7. See the diagram at the bottom of this page for a graphical summary

SSURGO discontinuity example: Boundary between Glenn and Colusa counties, illustrating differences in soil survey vintage.SSURGO discontinuity example: Boundary between Glenn and Colusa counties, illustrating differences in soil survey vintage.

 
Notes on the Format of SSURGO

  • Adjacent surveys may have been composed by different individuals, and may be of widely different vintages. Any given survey must comply with basic standards, but older surveys reflect a more generalized approach than more modern surveys. The figure to the right illustrates such differences.
  • Polygons represent a repeating pattern of legend entries: groups of map-able soil concepts called map units
  • Map unit data is stored in the mapunit table, and is referenced by the field mukey
  • Pre-aggregated map unit data is stored in the muaggatt table, and is referenced by the field mukey
  • Map units are comprised of multiple, unmapped soil types called 'components'
  • Component data is stored in the component table, and is referenced by the field cokey
  • Soil components (or soil type) are associated with multiple horizons
  • Horizon data is stored in the chorizon table, and is referenced by the field cokey
  • Since there is a 1:many:many (mapunit:component:horizon) relationship between spatial and horizon-level soil property data two aggregation steps are required in order to produce a thematic map
  • This diagram illustrates the hierarchy of scales involved in soil survey information.

SSURGO Database Structure DiagramSSURGO Database Structure Diagram

Logistics: Getting Connected and Executing Queries

 
Building and Saving SQL code snippets
A simple text editor is the best environment for working on (and saving!) your SQL queries. Linux users are encouraged to use either 'Kate' or 'Kwrite'. Windows users should look into Notetab Light or SciTE. Mac users should check out TextWrangler.

 
Connecting to the database
In general, the simplest way to interact with our composite soil survey database is by connecting to the best with an SSH client. Mac/Linux users have this functionality built-in. Windows users will need to use something like Putty, or Xming. Once an ssh connection with the beast has been setup, you can connect to the database with the following:

psql -U soil ssurgo_combined

 
where psql is the postgresql client program, -U soil means connect as the user called "soil", and ssurgo_combined is the name of the actual database.

 
You can quit from the database shell using \q (followed by enter). Typing \? (followed by enter) will give a list of commands available in the psql shell.

 
Query structure and SQL
Numerous resources exisit for learning about SQL. See the attached PDF presentation at the bottom of this page for some SSURGO-related examples. I would recommend looking at the first couple chapters from the PostgreSQL book by Douglas and Douglas (the big purple book on the shelf). For an interactive learning approach SQL Zoo seems like a good start. In general most queries will have the format:

SELECT column_x, column_y, column_z
FROM table_x
WHERE column_x = 'something'
-- optional
GROUP BY column_x
ORDER BY column_x ; -- semi-colon denotes end of SQL statement

 
A simple query: column selection and filtering
Once connected it is possible to issue queries to the database. The results of a query are normally returned to the screen. For example, asking for the horizon boundaries and horizon names from the componenent identified with the given cokey (467038:646635) might look like this:

-- the query
SELECT cokey, hzname, hzdept_r, hzdepb_r
FROM chorizon
WHERE cokey = '467038:646635' ;

-- the results
     cokey     | hzname | hzdept_r | hzdepb_r
---------------+--------+----------+----------
 467038:646635 | Ap     |        0 |       18
 467038:646635 | Bw     |       18 |       41
 467038:646635 | Bk1    |       41 |       69
 467038:646635 | Bk2    |       69 |      109
 467038:646635 | Bk3    |      109 |      145
 467038:646635 | Bk4    |      145 |      183

 
A more complicated query: column calculation and aggregation
Compute the total water holding capacity for a given component, identified by (467038:646635). Note that several operations are being performed on the data:

  1. compute the available water holding capacity (in centemeters) for each horizon [green text]
  2. compute the profile depth (in centemeters) for each horizon [green text]
  3. aggregate the profile data by summing AHC and depth of each horizon [red text]
-- the query
select cokey, sum( (hzdepb_r - hzdept_r) * awc_r) as component_whc, sum( (hzdepb_r - hzdept_r)  ) as depth 
from chorizon 
where cokey = '467038:646635'
group by cokey ;

-- the results
     cokey     | component_whc | depth 
---------------+---------------+-------
 467038:646635 |         27.91 |   183

 
Note that it is also possible to save the results of a query into a new table. This is the usual strategy when performing any analysis that returns geometry (soil polygons, etc.) as geometry data cannot be visualized in a text display. Tables created in this manner can be exported from the database for map creation or further analysis. Spatial tables can be viewed directly by applications like QGIS or Mapserver. A simple method of accessing PostGIS data is not yet possible with ArcGIS.

Checking Type Locations

 
Just Checking

-- NAD27 to NAD83 
echo 119d7\'4\"W 36d23\'13\"N | cs2cs +proj=latlong +datum=NAD27 +to +proj=latlong +datum=NAD83 -f "%.6f"
-119.118718     36.386894 0.000037

 

-- the relevent SRIDs

 srid |                     proj4text                      
------+----------------------------------------------------
 4267 | +proj=longlat +ellps=clrk66 +datum=NAD27 +no_defs
 4269 | +proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs
 4326 | +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs


-- NAD27 type location point
\SET pt Transform(SetSRID(ST_MakePoint(-119.1177777, 36.38694444), 4267), 9001)

-- find intersecting map unit, and the NAD83 coords
SELECT AsText(Transform(:pt, 4269)) AS NAD83_pt, mapunit_poly.mukey, muname
FROM mapunit_poly
JOIN mapunit
ON mapunit_poly.mukey = mapunit.mukey
AND st_intersects(wkb_geometry, :pt) ;

                 nad83_pt                  | mukey  |                         muname                        
-------------------------------------------+--------+--------------------------------------------------------
 POINT(-119.118718055326 36.3868941884307) | 467194 | Grangeville sandy loam, drained, 0 TO 2 percent slopes

 
 
-- 250 m buffer around NAD27 type location coords
\SET box st_buffer(Transform(SetSRID(ST_MakePoint(-119.1177777, 36.38694444), 4267), 9001), 250)
 
-- select overlapping map units and their associated percent overlap
SELECT mapunit_poly.mukey, muname,
round((ST_Area(ST_Intersection(wkb_geometry, :box)) / ST_Area(:box)) * 100) AS area_pct
FROM mapunit_poly
JOIN mapunit
ON mapunit_poly.mukey = mapunit.mukey
AND st_intersects(wkb_geometry, :box)
ORDER BY area_pct DESC ;


 mukey  |                         muname                         | area_pct
--------+--------------------------------------------------------+----------
 467194 | Grangeville sandy loam, drained, 0 TO 2 percent slopes |       70
 463596 | Grangeville silt loam, drained                         |       12
 467206 | Riverwash                                              |       11
 467210 | Tujunga loamy sand, 0 TO 2 percent slopes              |        7

Getting Parent Material Data out of SSURGO

 
Parent material data is stored within the copm and copmgrp tables. The copm table can be linked to the copmgrp table via the 'copmgrpkey' field, and the copmgrp table can be linked to the component table via the 'cokey' field. The following queries illustrate these table relationships, and show one possible strategy for extracting the parent material information associated with the largest component of each map unit.

 
Several of the example queries are based on this map unit:

 
Query copmgrp table

SELECT *
FROM copmgrp
WHERE cokey = '461573:631329' ;

 areasymbol | pmgroupname | rvindicator |     cokey     |  copmgrpkey  
------------+-------------+-------------+---------------+--------------
 ca011      | alluvium    | Yes         | 461573:631329 | 461573:78270

 
Query copm table

SELECT *
FROM copm
WHERE copmgrpkey = '461573:78270' ;

 areasymbol | pmorder | pmmodifer | pmgenmod |  pmkind  | pmorigin |  copmgrpkey  |    copmkey    
------------+---------+-----------+----------+----------+----------+--------------+---------------
 ca011      |         |           |          | Alluvium |          | 461573:78270 | 461573:101001

 
Join parent material tables

SELECT cokey, pmgroupname, rvindicator, pmkind
FROM copmgrp
LEFT JOIN copm USING (copmgrpkey)
WHERE cokey = '461573:631329' ;

     cokey     | pmgroupname | rvindicator |  pmkind  
---------------+-------------+-------------+----------
 461573:631329 | alluvium    | Yes         | Alluvium

 
Join component table to parent material tables

SELECT mukey, cokey, comppct_r, taxorder, taxsuborder, taxgrtgroup, taxsubgrp, compname, pm.pmgroupname, pm.rvindicator, pm.pmkind
FROM
component
LEFT JOIN
        (
        -- get parent material data for each cokey
        SELECT cokey, pmorder, pmmodifer, pmgenmod, pmgroupname, rvindicator, pmkind
        FROM copmgrp
        LEFT JOIN copm USING (copmgrpkey)
        ) AS pm
USING (cokey)
WHERE mukey = '461573' ;

 mukey  |     cokey     | comppct_r | taxorder  | taxsuborder | taxgrtgroup  |      taxsubgrp      | compname | pmgroupname | rvindicator |  pmkind  
--------+---------------+-----------+-----------+-------------+--------------+---------------------+----------+-------------+-------------+----------
 461573 | 461573:631329 |        90 | Alfisols  | Xeralfs     | Palexeralfs  | Typic Palexeralfs   | Hillgate | alluvium    | Yes         | Alluvium
 461573 | 461573:631330 |         5 | Alfisols  | Xeralfs     | Haploxeralfs | Typic Haploxeralfs  | Arbuckle | alluvium    | Yes         | Alluvium
 461573 | 461573:631331 |         2 | Entisols  | Fluvents    | Xerofluvents | Mollic Xerofluvents | Arand    | alluvium    | Yes         | Alluvium
 461573 | 461573:631332 |         1 |           |             |              |                     | Unnamed  | alluvium    | Yes         | Alluvium
 461573 | 461573:631333 |         2 | Mollisols | Xerolls     | Haploxerolls | Pachic Haploxerolls | Westfan  | alluvium    | Yes         | Alluvium

 
Extract component data and corresponding parent material data for the largest component of each map unit

CREATE TEMP TABLE temp_component_data AS
SELECT DISTINCT ON (mukey) mukey, cokey, comppct_r, taxorder, taxsuborder, taxgrtgroup, taxsubgrp, compname, pm.pmgroupname, pm.rvindicator, pm.pmkind, pm.pmorder, pm.pmmodifer, pm.pmgenmod
FROM
component
LEFT JOIN
        (
        -- get parent material data for each cokey
        SELECT cokey, pmorder, pmmodifer, pmgenmod, pmgroupname, rvindicator, pmkind
        FROM copmgrp
        LEFT JOIN copm USING (copmgrpkey)
        ) AS pm
USING (cokey)
WHERE component.areasymbol IN ('ca624', 'ca628')
AND majcompflag = 'Yes'
ORDER BY mukey, comppct_r DESC ;

-- add an index so that joining with geometry is faster
CREATE INDEX temp_component_data_mukey_idx ON temp_component_data (mukey) ;
VACUUM ANALYZE temp_component_data;

 
Combine with geometry and prepare for export

-- combine with geometry
CREATE TABLE temp_combined_data AS
SELECT ogc_fid, wkb_geometry, temp_component_data.*
FROM mapunit_poly
LEFT JOIN temp_component_data USING (mukey)
WHERE mapunit_poly.areasymbol IN ('ca624', 'ca628') ;

 
Export to shapefile

# export
pgsql2shp -k -f data.shp -h the_host_machine -P the_username -u the_password -g wkb_geometry ssurgo_combined temp_combined_data

 
Remove temporary tables

-- clean-up
DROP TABLE temp_combined_data ;

Learning by Example: Common Queries

 
Selection, Filtering and Sorting

  • Selecting a summary of horizon thickness, sand, silt, clay and AWC for a given component (chorizon table).
    SELECT  hzname AS name, hzdepb_r - hzdept_r AS thickness, sandtotal_r AS sand, silttotal_r AS silt, claytotal_r AS clay, awc_r AS awc
    FROM chorizon
    WHERE cokey = '467038:646635'
    ORDER BY hzdept_r;

    Query Results
    name | thickness | sand | silt | clay | awc
    --------+-----------+------+------+------+------
    Ap | 18 | 35 | 37 | 28 | 0.16
    Bw | 23 | 35 | 40 | 25 | 0.15
    Bk1 | 28 | 35 | 40 | 25 | 0.16
    Bk2 | 40 | 35 | 40 | 25 | 0.16
    Bk3 | 36 | 35 | 40 | 25 | 0.16
    Bk4 | 38 | 60 | 25 | 15 | 0.13
  • Select some common fields from the component table:
     SELECT cokey, majcompflag, comppct_r, wei, weg, tfact
    FROM component
    WHERE mukey = '467038'
    ORDER BY comppct_r DESC;

    Query Results
    cokey | majcompflag | comppct_r | wei | weg | tfact
    ---------------+-------------+-----------+-----+-----+-------
    467038:646635 | Yes | 85 | 48 | 6 | 5
    467038:646636 | No | 5 | | |
    467038:646638 | No | 4 | | |
    467038:646640 | No | 2 | | |
    467038:646639 | No | 2 | | |
    467038:646637 | No | 2 | | |
  • Create a new table 'yolo_comp' using the component table, for a specific survey area (Yolo County), with components sorted by their component percentages.
    CREATE TABLE yolo_comp AS
    SELECT * FROM component
    WHERE areasymbol = 'ca113'
    ORDER BY mukey, comppct_r DESC ;

     
    The resulting table can be copied to a CSV file (in the current working directory) like this: \copy yolo_comp TO 'yolo_comp.csv' CSV HEADER

     
    If you are done with the table, remove it with the following SQL: DROP TABLE yolo_comp ;

Identifying the Largest Components

 
Overview:
Two methods for the selection of the largest components (based on the comppct_r column) within map units. This approach to selecting a single component per map unit is appropriate in situations where a single representative feature or property is sought. The partitioning of components within a map unit is closely related to the map unit type: complex, association, consociation or an "undifferentiated group". A breakdown of the number of components per each map unit type is summarized by the following query (26400 map units / 45971 major components):

SELECT mapunit.mukind, round(avg(n_components)) AS avg, min(n_components), max(n_components)
FROM mapunit JOIN
        (
        SELECT mapunit.mukey, count(component.mukey) AS n_components
        FROM
        mapunit JOIN component
        ON mapunit.mukey = component.mukey
        WHERE component.majcompflag = 'Yes' AND mapunit.mukind IS NOT NULL
        GROUP BY mapunit.mukey
        ) AS a
ON a.mukey = mapunit.mukey
GROUP BY mapunit.mukind ;

Query Results
mukind | avg | min | max
------------------------+-----+-----+-----
Consociation | 1 | 1 | 3
Complex | 2 | 1 | 4
Association | 3 | 1 | 4
Undifferentiated group | 2 | 1 | 4

 
Method 1: Filtering component percentages with the DISTINCT keyword.

  1. Use of the SELECT DISTINCT ON operator. Note that the ordering is done before the DISTINCT filter is applied, resulting in the largest (by component percentage) component. Note that results from ties are unpredictable.
    • SELECT DISTINCT ON (mukey) mukey, cokey, comppct_r, compname, taxorder
      FROM component
      WHERE areasymbol = 'ca654' AND majcompflag = 'Yes'
      ORDER BY mukey, comppct_r DESC ;

      Query Results
      mukey | cokey | comppct_r | compname | taxorder
      ---------+----------------+-----------+------------+-----------
      1487066 | 1487066:638252 | 80 | Auberry | Alfisols
      1487067 | 1487067:638274 | 80 | Auberry | Alfisols
      1487068 | 1487068:638383 | 85 | Crouch | Mollisols
      1487069 | 1487069:632318 | 85 | Cajon | Entisols
      1487070 | 1487070:632372 | 85 | Excelsior | Entisols
      1487071 | 1487071:632519 | 85 | Kimberlina | Entisols
      1487072 | 1487072:632642 | 85 | Nord | Mollisols
      1487076 | 1487076:632832 | 85 | Wasco | Entisols
      1487077 | 1487077:632866 | 85 | Whitewolf | Entisols
      464171 | 464171:640646 | 85 | Academy | Alfisols
    • We can inspect possible ties or other sources of error by looking at the smallest components.

      SELECT * FROM
              (
              SELECT DISTINCT ON (mukey) mukey, cokey, comppct_r, compname, taxorder
              FROM component
              WHERE areasymbol = 'ca654' AND majcompflag = 'Yes'
              ORDER BY mukey, comppct_r DESC
              ) AS a
      ORDER BY a.comppct_r ASC ;

      Query Results
      mukey | cokey | comppct_r | compname | taxorder
      ---------+----------------+-----------+---------------------+-------------
      464432 | 464432:1380237 | 10 | Ramona | Alfisols
      464184 | 464184:640686 | 30 | Ahwahnee | Alfisols
      464211 | 464211:640756 | 30 | Auberry | Alfisols
      464210 | 464210:640752 | 35 | Auberry | Alfisols
      464529 | 464529:641600 | 35 | Vista | Inceptisols
      464530 | 464530:641604 | 35 | Vista | Inceptisols
      464531 | 464531:641609 | 35 | Rock outcrop |

 
Finding the largest component (based on comppct_r) per mukey

  1. Identify how many map units there are in a given region or survey area. In this case San Joaquin County; areasymbol = 'ca077'.
    SELECT count(mukey) FROM mapunit WHERE areasymbol = 'ca113';

    Query Results
    count
    -------
    186
  2. Identify which map units contain components where a conventional, select largest component percentage type approach will not work. This commonly happens when there are several major components with the same comppct_r value, within the same map unit. Any returned mukey values need to be evaluated.
    SELECT component.mukey, count(component.mukey)
    FROM component
            JOIN
            (
            SELECT mukey, max(comppct_r) AS comppct FROM component WHERE majcompflag = 'Yes' AND areasymbol = 'ca077' GROUP BY mukey
            ) AS a
    ON component.comppct_r = a.comppct AND component.mukey = a.mukey
    GROUP BY component.mukey
    HAVING count(component.mukey) > 1 ;

    Query Results
    mukey | count
    --------+-------
    462043 | 2
    462068 | 2
  3. Evaluation of the two "trouble" map units identified above. Note the use of the where mukey in ('462043', '462068') syntax.
    SELECT mukey, cokey, comppct_r, compname, taxorder
    FROM component
    WHERE majcompflag = 'Yes' AND mukey IN ('462043', '462068')
    ORDER BY mukey, comppct_r DESC ;

    Query Results
    mukey | cokey | comppct_r | compname | taxorder
    --------+---------------+-----------+------------+-----------
    462043 | 462043:634006 | 45 | Dumps |
    462043 | 462043:634007 | 45 | Tailings |
    462068 | 462068:634159 | 30 | Honker | Alfisols
    462068 | 462068:634160 | 30 | Vallecitos | Alfisols
    462068 | 462068:634161 | 25 | Gonzaga | Mollisols

Profile water storage as calculated from SSURGO

 
Overview
This example illustrates a component-percent weighted mean water storage, based on total profile storage values. The comppct_r column is used as a weight to adjust the influence of the profile water storage values for each component: larger weights (larger components) result in more influence. When performing weighted means, use care in the selection of an appropriate weighting parameter. A selection of a weighting variable not direclty related to the property which is being averaged can result in an effect known as Simpson's paradox.

 
Some background on Soil Water Holding Capacity and Irrigation Management.

 
The general formula for computing profile water storage is defined as follows. Note that the awc_r column in the SSURGO database (chorizon table) is pre-adjusted to compensate for coarse fragments (see relevant SSURGO metadata below).

 
Calculate the water storage within a given component
component_whc = profile_sum_awc = sum( (hzdepb_r - hzdept_r) * awc_r )

 
Calculate the weighted mean value of the profile water storage, for a given map unit
weighted mean = sum(weights * x) / sum(weights)
mapunit_whc = sum( comppct_r * component_whc ) / sum( comppct_r )
Note: weights corresponding to non-soil components must be filtered out

 
From the SSURGO 2.1 table definitions:

Column Label: AWC - Representative Value (awc_r)
The amount of water that an increment of soil depth, inclusive of fragments, can store that is available to plants. AWC is expressed as a volume fraction, and is commonly estimated as the difference between the water contents at 1/10 or 1/3 bar (field capacity) and 15 bars
(permanent wilting point) tension and adjusted for salinity, and fragments.
  1. Reduce to 1:1 relationship with cokey by aggrigating AWC horizon data in the chorizon table, and linking to the component table.
     SELECT mukey, compname, comppct_r, a.* FROM component
    JOIN  
            (
            SELECT cokey, sum( (hzdepb_r - hzdept_r) * awc_r) AS component_whc, sum((hzdepb_r - hzdept_r)) AS depth
            FROM chorizon  WHERE areasymbol = 'ca113'
            GROUP BY cokey
            ) AS a
    ON component.cokey = a.cokey
    WHERE component.areasymbol = 'ca113'
    ORDER BY mukey ;

    Query Results
    mukey | compname | comppct_r | cokey | component_whc | depth
    --------+---------------------+-----------+----------------+---------------+-------
    459225 | Millsholm | 30 | 459225:624008 | 6.82 | 58
    459225 | Dibble | 55 | 459225:624007 | 12.36 | 86
    459226 | Millsholm | 40 | 459226:624013 | 6.82 | 58
    459226 | Dibble | 45 | 459226:624012 | 12.36 | 86
    459227 | Millsholm | 30 | 459227:624019 | 6.82 | 58
    459227 | Dibble | 60 | 459227:624018 | 12.36 | 99
    ...
  2. Reduce to 1:1 relationship with mukey by aggreigating based on component percent:
    SELECT mukey,
    -- note that weights from non-soil components must be removed
    -- otherwise, weighted mean values will be too low
    SUM(comppct_r * component_whc) / SUM(comppct_r) AS comppct_weighted_whc_cm
    FROM component
    JOIN
            (
            SELECT cokey, sum( (hzdepb_r - hzdept_r) * awc_r) AS component_whc,
            sum((hzdepb_r - hzdept_r)) AS depth
            FROM chorizon
            WHERE areasymbol = 'ca113'
            GROUP BY cokey
            ) AS a
    USING (cokey)
    WHERE component.areasymbol = 'ca113'
    -- filter out components that are missing soils data
    AND a.component_whc IS NOT NULL
    GROUP BY mukey ;

    Query Results
    mukey | comppct_weighted_whc_cm
    --------+-------------------------
    459225 | 10
    459226 | 10
    459227 | 11
    ...
  3. Link to polygons and setup access permissions:
    -- create the new table with both geometry and attributes
    CREATE TABLE yolo_whc AS
    SELECT  ogc_fid, wkb_geometry AS wkb_geometry, b.mukey, b.comppct_weighted_whc_cm
    FROM mapunit_poly
    -- use LEFT JOIN to include non-soil polygons in the result set
    -- alternatively use JOIN to ignore non-soil polygons
    LEFT JOIN
            (
            SELECT mukey,
            -- note that weights from non-soil components must be removed
            -- otherwise, weighted mean values will be too low
            SUM(comppct_r * component_whc) / SUM(comppct_r) AS comppct_weighted_whc_cm
            FROM component
            JOIN
              (
              SELECT cokey, sum( (hzdepb_r - hzdept_r) * awc_r) AS component_whc,
              sum((hzdepb_r - hzdept_r)) AS depth
              FROM chorizon
              WHERE areasymbol = 'ca113'
              GROUP BY cokey
              ) AS a
            USING (cokey)
            WHERE component.areasymbol = 'ca113'
            -- filter out components that are missing soils data
            AND a.component_whc IS NOT NULL
            GROUP BY mukey
            ) AS b
    -- JOIN constraint
    USING (mukey)
    -- optional constraint to limit geometry search in mapunit_poly table
    WHERE mapunit_poly.areasymbol = 'ca113' ;

     
    Create indexes and register the new geometry:

    -- create attribute and spatial index:
    CREATE UNIQUE INDEX yolo_whc_idx ON yolo_whc (ogc_fid) ;
    CREATE INDEX yolo_whc_spatial_idx ON yolo_whc USING gist (wkb_geometry gist_geometry_ops);

    -- register in geometry_columns table:
    INSERT INTO geometry_columns VALUES ('','public','yolo_whc','wkb_geometry',2,9001,'POLYGON');

    -- optional:
    -- fix permissions
    -- GRANT SELECT on table yolo_whc to [user] ;

    -- cleanup
    vacuum analyze yolo_whc  ;

    Yolo County Water Holding Capacity Map: Profile total water holding capacity as computed by component percentage weighted average.Yolo County Water Holding Capacity Map: Profile total water holding capacity as computed by component percentage weighted average.

  4. Dump to shapefile with ogr2ogr:
     ogr2ogr ywhc.shp PG:"dbname=ssurgo_combined user=xxxx password=xxxx" yolo_whc

     
    More examples of exporting data can be found here.

Soil Texture Example

 
Premise

Compute a series of weighted-average soil texture fractions (sand, silt, clay), for every component, of every map unit in Yolo County. These values will be further weighted by the spatial distribution of each map unit.

CREATE TABLE yolo_wt_mean_texture AS
-- join with polygons, and compute areas weights
SELECT mapunit_poly.mukey,
sum(ST_Area(wkb_geometry)) / (SELECT ST_Area(wkb_geometry) FROM mapunit_bound_poly WHERE areasymbol = 'ca113') AS area_wt,
sand, silt, clay
FROM
mapunit_poly
JOIN
        (
        -- compute component percent weighted mean
        SELECT mukey,
        sum(comppct_r * sand) / sum(comppct_r) AS sand,
        sum(comppct_r * silt) / sum(comppct_r) AS silt,
        sum(comppct_r * clay) / sum(comppct_r) AS clay
        FROM
        component
        JOIN
                (
                -- compute hz thickness weighted mean
                SELECT cokey,
                sum((hzdepb_r - hzdept_r) * sandtotal_r) / sum(hzdepb_r - hzdept_r) AS sand,
                sum((hzdepb_r - hzdept_r) * silttotal_r) / sum(hzdepb_r - hzdept_r) AS silt,
                sum((hzdepb_r - hzdept_r) * claytotal_r) / sum(hzdepb_r - hzdept_r) AS clay
                FROM chorizon
                WHERE sandtotal_r IS NOT NULL
                AND silttotal_r IS NOT NULL
                AND claytotal_r IS NOT NULL
                AND areasymbol = 'ca113'
                GROUP BY cokey
                ) AS co_agg
        ON component.cokey = co_agg.cokey
        GROUP BY component.mukey
        ) AS mu_agg
ON mapunit_poly.mukey = mu_agg.mukey
GROUP BY mapunit_poly.mukey, sand, silt, clay;

 
Simple Visualization with R

# dump the data:
# echo 'select * from yolo_wt_mean_texture' |  psql -A -F "," -U xxx -h xxx dbname > temp/yolo_texture.csv
#
# remember to trim off the last line!

# load some libs:
library(plotrix)

# read in the data
x <- read.csv('yolo_texture.csv')

# simple soil texture, with symbol size weighted by area weight
soil.texture(x[,3:5], cex=sqrt(50*x$area_wt), pch=16, col.symbol=rgb(65,105,225, alpha=100, max=255),
show.lines=T, show.names=T, col.lines='black', col.names='black', main='Yolo County Soil Textures')

triax.points(cbind(weighted.mean(x$sand, x$area_wt), weighted.mean(x$silt, x$area_wt), weighted.mean(x$clay, x$area_wt)),
col.symbols='orange', pch=16, cex=2)

Yolo County Common Soil Textures: Blue symbols mark common soil textures and their relative size denotes abundance. The orange symbol marks the weighted average soil texture for all of Yolo County.Yolo County Common Soil Textures: Blue symbols mark common soil textures and their relative size denotes abundance. The orange symbol marks the weighted average soil texture for all of Yolo County.

Identification of Dated Surfaces via Soil Series

East Side Alluvial FormationsEast Side Alluvial Formations

 
Overview
A simple association between dated landforms and soil series name [1] was used to extract soil polygons from a composite soil survey database.

Soil Series Associated Formation Approximate Age (1000 yrs ago)
Redding Laguna 1600 - 2000 kya
Corning Laguna 1600 - 2000 kya
Keyes Laguna 1600 - 2000 kya
Whitney Turlock Lake 500 - 700 kya
Montpellier Turlock Lake 500 - 700 kya
Rocklin Turlock Lake 500 - 700 kya
Snelling Riverbank 100 - 300 kya
San Joaquin Riverbank 100 - 300 kya
Exiter Riverbank 100 - 300 kya
Madera Riverbank 100 - 300 kya
Hanford Modesto 10 - 40 kya
Grangeville Holocene < 10 kya
  1. Create a soil series - dated landform lookup table:

    -- create a lookup table
    CREATE TABLE dated_landforms (
    soil_series varchar(20),
    formation varchar(20),
    approx_age varchar(30)
    );
    -- populate table
    INSERT INTO dated_landforms VALUES ('Redding','Laguna','1600 - 2000 kya') ;
    INSERT INTO dated_landforms VALUES ('Corning','Laguna','1600 - 2000 kya') ;
    INSERT INTO dated_landforms VALUES ('Keyes','Laguna','1600 - 2000 kya') ;
    INSERT INTO dated_landforms VALUES ('Whitney','Turlock Lake','500 - 700 kya') ;
    INSERT INTO dated_landforms VALUES ('Montpellier','Turlock Lake','500 - 700 kya') ;
    INSERT INTO dated_landforms VALUES ('Rocklin','Turlock Lake','500 - 700 kya') ;
    INSERT INTO dated_landforms VALUES ('Snelling','Riverbank','100 - 300 kya') ;
    INSERT INTO dated_landforms VALUES ('San Joaquin','Riverbank','100 - 300 kya') ;
    INSERT INTO dated_landforms VALUES ('Exiter','Riverbank','100 - 300 kya') ;
    INSERT INTO dated_landforms VALUES ('Madera','Riverbank','100 - 300 kya') ;
    INSERT INTO dated_landforms VALUES ('Hanford','Modesto','10 - 40 kya') ;
    INSERT INTO dated_landforms VALUES ('Grangeville','Holocene','< 10 kya') ;
  2. Select map units (mukey) by suitible series concepts, associated with major components. Note that the DISTINCT ON (mukey) ... ORDER BY mukey, comppct_r DESC pattern can be used to select the largest component for each map unit. Pattern matching is used to safely join variants with the soil series names in our look-up table: i.e. 'San Joaquin variant' will match 'San Joaquin'.

    -- keep only the largest formation (by total component percent) within each map unit key
    SELECT DISTINCT ON (mukey) mukey, formation, sum(comppct_r) AS formation_pct
    FROM component
    JOIN
    dated_landforms
    -- use pattern matching to also include variants
    ON compname ~~* lower(soil_series || '%') = 't'
    -- subset to select survey areas on the east side of the valley
    AND component.areasymbol IN ('ca654', 'ca651', 'ca649', 'ca644', 'ca648', 'ca077')
    AND majcompflag = 'Yes'
    -- combine components of the same formation
    GROUP BY mukey, formation
    -- use in conjunction with the DISTINCT statement to keep only the largest
    ORDER BY mukey, formation_pct DESC;
  3. Create a new classified table called east_side_all. This query involves 6 survey areas, 11423 polygons, and requires about 11 seconds to complete.

    CREATE TABLE east_side_all AS
    -- select geom column and the feature id
    -- along with our formation names
    SELECT wkb_geometry AS wkb_geometry, ogc_fid, a.*
    FROM mapunit_poly
    JOIN
    (
    SELECT DISTINCT ON (mukey) mukey, formation, sum(comppct_r) AS formation_pct
    FROM component
    JOIN
    dated_landforms
    ON compname ~~* lower(soil_series || '%') = 't'
    AND component.areasymbol IN ('ca654', 'ca651', 'ca649', 'ca644', 'ca648', 'ca077')
    AND majcompflag = 'Yes'
    GROUP BY mukey, formation
    ORDER BY mukey, formation_pct DESC
    ) AS a
    ON mapunit_poly.mukey = a.mukey
    -- limit polygon selection by survey area
    AND mapunit_poly.areasymbol IN ('ca654', 'ca651', 'ca649', 'ca644', 'ca648', 'ca077') ;
  4. Index, register geometry, and setup permissions
     -- indexing
    CREATE INDEX east_side_all_fid_idx ON east_side_all  (ogc_fid) ;
    CREATE INDEX east_side_all_gis_idx ON east_side_all USING gist (wkb_geometry gist_geometry_ops) ;
    &nbsp;
    -- register geometry
    INSERT INTO geometry_columns VALUES ('','public','east_side_all','wkb_geometry',2,9001,'POLYGON');
    &nbsp;
    -- permissions
    GRANT SELECT ON TABLE east_side_all TO soil  ;
  5. Tabulate acreage for each formation. (approximately 1/3 second to complete)
     
    SELECT * FROM
            (
            SELECT round(sum(area(wkb_geometry)) * 0.000247) AS area_ac, formation
            FROM east_side_all
            GROUP BY formation
            ) AS a
    ORDER BY a.area_ac DESC;

    area_ac formation
    295868 Riverbank
    253424 Modesto
    151149 Turlock Lake
    121085 Laguna
    96981 Holocene

    (5 rows)

  6. Dump to local file in SHP format
     ogr2ogr east_side_all.shp PG:"dbname=ssurgo_combined user=xxxx password=xxxx host=xxxx" east_side_all

 
References:

  1. Smith, D.W. & Verrill, W.L. Witham, C.; Bauder, E.; Belk, D.; Ferren Jr., W. & Ornduff, R. (ed.) Vernal Pool-Soil-Landform Relationships in the Central Valley, California 1998, 15-23

Seasonally Wet Soils and Shrink-Swell Potential

PostGIS: ssurgo example- wet polygons
Example 1 map
PostGIS: ssurgo example- shrink swell potential
Example 2 map

 
Example 1: The location of seasonaly wet soils via two methods: hydricrating and USDA Soil Taxonomy interpretation.

-- optionally link polygons here...
-- compute the percent of each map unit that contains components that may be seasonally wet
SELECT mukey, wet_flag, sum(comppct_r) AS pct_mu
FROM
        (
        SELECT mukey, cokey, comppct_r,
        -- slightly less restrictive than hydricrating alone
        CASE WHEN ((taxclname ~~* '%aqu%') = 't') OR (hydricrating = 'Yes') then 1 ELSE 0 END AS wet_flag
        FROM component  
        -- subset to Yolo County
        WHERE areasymbol = 'ca113'
        ) AS yolo_wet_components
-- only keep map unit with some component that meets the criteria
WHERE wet_flag = 1
-- aggregate by map unit, wet_flag
GROUP BY yolo_wet_components.mukey, yolo_wet_components.wet_flag
ORDER BY mukey;

 

 
Example 1.1 Extract a list of wet components, and sum area based on component (series) name

-- compute estimated, total acreage of seasonally wet map units in Yolo Co.
SELECT mapunit.musym, mapunit.mukey, mapunit.muname,
round(sum(ST_Area(wkb_geometry) * (yolo_mu_data.pct_mu::numeric/100.0) * 0.000247)::numeric, 2) AS wet_area_ac
FROM
mapunit_poly
JOIN
        (
        -- compute the percent of each map unit that contains components that may be seasonally wet
        SELECT mukey, wet_flag, sum(comppct_r) AS pct_mu
        FROM
                (
                SELECT mukey, cokey, comppct_r,
                -- slightly less restrictive than hydricrating alone
                CASE WHEN ((taxclname ~~* '%aqu%') = 't') OR (hydricrating = 'Yes') then 1 ELSE 0 END AS wet_flag
                FROM component  
                -- subset to Yolo County
                WHERE areasymbol = 'ca113'
                ) AS yolo_wet_components
        -- only keep map unit with some component that meets the criteria
        WHERE wet_flag = 1
        -- aggregate by map unit, wet_flag
        GROUP BY yolo_wet_components.mukey, yolo_wet_components.wet_flag
        ) AS yolo_mu_data
ON mapunit_poly.mukey = yolo_mu_data.mukey
JOIN mapunit
ON mapunit.mukey = yolo_mu_data.mukey
-- aggregate by map unit
GROUP BY mapunit.mukey, mapunit.muname, mapunit.musym
ORDER BY wet_area_ac DESC;

 musym | mukey  |                             muname                             | wet_area_ac 
-------+--------+----------------------------------------------------------------+-------------
 Sc    | 459268 | Sacramento clay                                                |    35710.90
 Mf    | 459244 | Marvin silty clay loam                                         |    18877.18
 Sg    | 459272 | Sacramento soils, flooded                                      |    12273.51
 Cn    | 459219 | Clear Lake soils, flooded                                      |    11665.84
 Cc    | 459216 | Capay soils, flooded                                           |    11029.83
 Sv    | 459288 | Sycamore complex, drained                                      |     8410.41
 St    | 459286 | Sycamore silty clay loam, drained                              |     6900.77
 Ck    | 459218 | Clear Lake clay                                                |     6737.88
 Sa    | 459266 | Sacramento silty clay loam                                     |     6014.48
 Sp    | 459283 | Sycamore silt loam, drained                                    |     5537.18
 Sw    | 459289 | Sycamore complex, flooded                                      |     5041.96
 Wb    | 459301 | Willows clay                                                   |     4950.34
 Ss    | 459285 | Sycamore silty clay loam                                       |     4859.59
 Pb    | 459254 | Pescadero silty clay, saline-alkali                            |     4700.77
 So    | 459282 | Sycamore silt loam                                             |     3953.42
 Rh    | 459262 | Riverwash                                                      |     3698.88
 Pc    | 459255 | Pescadero soils, flooded                                       |     3589.01
 Tb    | 459292 | Tyndall very fine sandy loam                                   |     3299.03
 Ob    | 459252 | Omni silty clay                                                |     3174.44

[...]

 
Example 2: Classify soils according to shrink-swell capacity of the top 1 meter of soil, weighted by horizon thickness and component percent.

-- set a lower boundary for the query
\SET lwr_bdy 100

-- add a class label, based on NRCS guidelines
SELECT mapunit.musym, mapunit.muname, mapunit.muacres,
round(mu_wt_lep::numeric, 2) AS lep,
CASE WHEN mu_wt_lep < 3 THEN 'Low'
WHEN mu_wt_lep >= 3 AND mu_wt_lep < 6 THEN 'Moderate'
WHEN mu_wt_lep >= 6 AND mu_wt_lep < 9 THEN 'High'
WHEN mu_wt_lep >= 9 THEN 'Very High'
END AS lep_class
FROM
        (
        -- compute map unit lep, weighted by component percent, to set depth
        SELECT component.mukey,
        sum(component.comppct_r * co_wt_mean_lep) / sum(component.comppct_r) AS mu_wt_lep
        FROM
                (
                -- compute a horizon-thickness weighted mean lep to a set depth
                SELECT cokey, sum(thick * lep_r) / sum(thick) AS co_wt_mean_lep
                FROM
                        (
                        -- compute horizon thickness, but only to a set depth
                        SELECT cokey, hzdept_r, hzdepb_r, lep_r,
                        CASE WHEN hzdepb_r > :lwr_bdy THEN (:lwr_bdy - hzdept_r)
                        ELSE (hzdepb_r - hzdept_r) END AS thick
                        FROM chorizon
                        WHERE areasymbol = 'ca113'
                        AND lep_r IS NOT NULL
                        AND hzdept_r <= :lwr_bdy
                        ) AS hz_lep
                GROUP BY cokey
                ) AS co_lep
        JOIN component
        ON co_lep.cokey = component.cokey
        GROUP BY mukey
        ) AS mu_lep
JOIN mapunit
ON mu_lep.mukey = mapunit.mukey
ORDER BY muacres DESC, lep DESC;

 musym |                              muname                              | muacres | lep  | lep_class 
-------+------------------------------------------------------------------+---------+------+-----------
 Ya    | Yolo silt loam                                                   |   39698 | 2.52 | Low
 Sc    | Sacramento clay                                                  |   34886 | 7.50 | High
 Ca    | Capay silty clay                                                 |   33465 | 7.50 | High
 MrG2  | Millsholm rocky loam, 15 to 75 percent slopes, eroded            |   30118 | 1.50 | Low
 Rg    | Rincon silty clay loam                                           |   24580 | 6.36 | High
 BrA   | Brentwood silty clay loam, 0 to 2 percent slopes                 |   23045 | 7.50 | High
 CtD2  | Corning gravelly loam, 2 to 15 percent slopes, eroded            |   22080 | 5.34 | Moderate
 Mf    | Marvin silty clay loam                                           |   20970 | 6.60 | High
 DaF2  | Dibble clay loam, 30 to 50 percent slopes, eroded                |   18612 | 7.11 | High
 SmE2  | Sehorn-Balcom complex, 15 to 30 percent slopes, eroded           |   17794 | 6.17 | High
 TaA   | Tehama loam, 0 to 2 percent slopes                               |   16622 | 3.75 | Moderate
 BdF2  | Balcom-Dibble complex, 30 to 50 percent slopes, eroded           |   16405 | 5.73 | Moderate
 SmD   | Sehorn-Balcom complex, 2 to 15 percent slopes                    |   16117 | 6.50 | High
 BaF2  | Balcom silty clay loam, 30 to 50 percent slopes, eroded          |   12637 | 4.50 | Moderate
 Sg    | Sacramento soils, flooded                                        |   12258 | 6.27 | High
 Cn    | Clear Lake soils, flooded                                        |   11666 | 6.92 | High
 SmF2  | Sehorn-Balcom complex, 30 to 50 percent slopes, eroded           |   11226 | 6.33 | High
 Cc    | Capay soils, flooded                                             |   11030 | 7.50 | High
 Sv    | Sycamore complex, drained                                        |    9241 | 4.18 | Moderate
 Ms    | Myers clay                                                       |    8938 | 7.50 | High
 PfF2  | Positas gravelly loam, 30 to 50 percent slopes, eroded           |    7920 | 5.34 | Moderate
 St    | Sycamore silty clay loam, drained                                |    7839 | 4.50 | Moderate
 Ck    | Clear Lake clay                                                  |    6946 | 7.50 | High
 Ra    | Reiff very fine sandy loam                                       |    6847 | 1.50 | Low

[...]