Open Source Software Tools for Soil Scientists

 
Abstract:
A collection of articles, methods, and examples of how open source software tools can assist with research in the field of soil science. Additional links to methods in environmental sciences within the realm of statistics, interpolation, morphometrics, and more.

 
UCD Resources:
Garrett and I are leading a seminar-style course covering some of this material, Fall quarter, 2006.

 
Articles:

 
Reference Material

General Purpose Programming with Scripting Languages

 
Some initial hints of use of various scripting languages.

 
Python

 
Ruby

Simple Python Interface to USGS TNM Elevation Service

 
Premise
Wanted a simpler way to access the USGS seamless elevation look-up service. Python seemed like a logical start. Note that the response from the USGS webservice is not correctly identified as valid XML by the python XML-parser. Therefore there is a small amount of scrubbing used to coerce the response into valid XML. Comments on why this is, or is not, a good idea are welcome.

 
Update It looks like the USGS service does not accept POST-style requests. I have made some small changes to the script below.

 
Example Listing

# nice CLI interface
import sys
from optparse import OptionParser

# url access
import urllib, urllib2

# simple XML parsing
from xml.dom import minidom

# CSV file parsing
import csv


# define service URL
base_url = 'http://gisdata.usgs.gov/XMLWebServices/TNM_Elevation_Service.asmx/getElevation?X_Value=%f&Y_Value=%f&Elevation_Units=meters&Source_Layer=-1&Elevation_Only=1'


#process command line (optparse)
parser = OptionParser()
parser.add_option("-f", "--file", dest="infile", help="input csv file containing WGS84 (lon,lat,site_id) record", metavar="FILE")


# process args
(options, args) = parser.parse_args()

#require an input file
if not options.infile:
        print "ERROR: must supply an input file!"
        sys.exit(1)


#open input file
try:
        infile = options.infile
        reader = csv.reader(open(infile, "rb"))
except:
        print "ERROR: Cannot open: " + infile
        sys.exit(1)


# read the csv file
for row in reader:
        lon = float(row[0])
        lat = float(row[1])
        site_id = row[2]
       
        # the base URL of the web service
        url = 'http://gisdata.usgs.gov/XMLWebServices/TNM_Elevation_Service.asmx/getElevation'
       
        # url GET args
        values = {'X_Value' : lon,
        'Y_Value' : lat,
        'Elevation_Units' : 'meters',
        'Source_Layer' : '-1',
        'Elevation_Only' : '1', }
       
        # make some fake headers, with a user-agent that will
        # not be rejected by bone-headed servers
        user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
        headers = {'User-Agent' : user_agent}
       
        # encode the GET arguments
        data = urllib.urlencode(values)
       
        # make the URL into a qualified GET statement:
        get_url = url + '?' + data
       
        # make the request: note that by ommitting the url arguments
        # we force a GET request, instead of a POST
        req = urllib2.Request(url=get_url, headers=headers)
        response = urllib2.urlopen(req)
        the_page = response.read()
       
        # convert the HTML back into plain XML
        for entity, char in (('lt', '<'), ('gt', '>'), ('amp', '&')):
                the_page = the_page.replace('&%s;' % entity, char)

        # clean some cruft... XML won't parse with this stuff in there...
        the_page = the_page.replace('<string xmlns="http://gisdata.usgs.gov/XMLWebServices/">', '')
        the_page = the_page.replace('<?xml version="1.0" encoding="utf-8"?>\r\n', '')
        the_page = the_page.replace('</string>', '')
        the_page = the_page.replace('<!-- Elevation Values of -1.79769313486231E+308 (Negative Exponential Value) may mean the data source does not have values at that point.  --> <USGS_Elevation_Web_Service_Query>', '')
       
        # parse the cleaned XML
        dom = minidom.parseString(the_page)
        children = dom.getElementsByTagName('Elevation_Query')[0]
       
        # extract the interesting parts
        elev = float(children.getElementsByTagName('Elevation')[0].firstChild.data)
        data_source = children.getElementsByTagName('Data_Source')[0].firstChild.data
       
        # print to stdout
        print "%f,%f,%s,%f,%s" % (lon, lat, site_id, elev, data_source)

 
Incantation

python elev_lookup.py -f ca_cities.txt

 
Input File CA cities in WGS84 (lon,lat,id) records

-122.32,37.78,Alameda NAS
-120.53,41.48,Alturas
-124.1,40.98,Arcata
-119.05,35.43,Bakersfield
-121.45,39.13,Beale AFB
-116.95,33.93,Beaumont
-116.62,35.28,Bicycle Lk
-116.68,34.27,Big Bear Apt
-118.6,37.60,Bishop
-120.7,39.28,Blue Canyon
-114.72,33.62,Blythe
-118.37,34.20,Burbank

 
Output

-122.320000,37.780000,Alameda NAS,2.123548,NED Contiguous U. S. 1/3W arc second elevation data
-120.530000,41.480000,Alturas,1331.056396,NED Contiguous U. S. 1/3W arc second elevation data
-124.100000,40.980000,Arcata,63.749836,NED Contiguous U. S. 1/3W arc second elevation data
-119.050000,35.430000,Bakersfield,148.473618,NED Contiguous U. S. 1/3W arc second elevation data
-121.450000,39.130000,Beale AFB,29.950783,NED Contiguous U. S. 1/3W arc second elevation data
-116.950000,33.930000,Beaumont,792.840881,NED Contiguous U. S. 1/3W arc second elevation data
-116.620000,35.280000,Bicycle Lk,711.765869,NED Contiguous U. S. 1/3W arc second elevation data
-116.680000,34.270000,Big Bear Apt,1697.037720,NED Contiguous U. S. 1/3W arc second elevation data
-118.600000,37.600000,Bishop,2209.536865,NED Contiguous U. S. 1/3W arc second elevation data
-120.700000,39.280000,Blue Canyon,1603.887573,NED Contiguous U. S. 1/3W arc second elevation data
-114.720000,33.620000,Blythe,120.626007,NED Contiguous U. S. 1/3W arc second elevation data
-118.370000,34.200000,Burbank,224.287033,NED Contiguous U. S. 1/3W arc second elevation data
-117.350000,33.300000,Camp Pendlet,21.895588,NED Contiguous U. S. 1/3W arc second elevation data

Text processing with the UNIX toolbox

 
Working with DBF files

  • Extract the text contents of a DBF file with dbfxtrct, select out the 3rd and 2nd columns with awk, keep only unique rows with sort and uniq, and finally saving to a new file called 'ca_fips-name_list.txt' with shell redirection.
    # dbfxtrct utility
    dbfxtrct -ica_counties_aea83.dbf | awk -F"," '{print $3","$2}' | sort -g | uniq > ca_fips-name_list.txt
  • Other DBF tools include:

    • dbf2mysql
    • dbf2pg
    • dbfadd
    • dbfcreate
    • dbfdump
    • dbfutil1
    • dbfxtrct

Graphviz

http://www.graphviz.org/

 
Example DOT Code:

digraph soil {
        size="8,8";
        node [fontsize=12, color = lightblue2, style = filled];
       
        "solar radiation budget" [color = yellow];
        "Cs log(As)" [color = yellow];
        "landscape position" [color = yellow];
        "aspect" [color = yellow];
        "geology" [color = yellow];
        "VEG" [color = yellow];
        "landscape / landform age" [color = red];
       
        subgraph cluster0 {
                "surrounding terrain";
                "precip";
                "landscape position";
                "aspect";
                "local weather";
                "VEG";
                "solar radiation budget";
                "Cs log(As)";
                "soil moisture status";
                "xy water routing";
                "energy flux";
                "erosional / depositional surface";
        }

        subgraph cluster1 {
                "geology";
                "parent material";
                "soil texture";
                "infiltration rate";
                "leaching";
                "weathering rate";
                "landscape / landform age";
                "vertical soil development";
                "soil properties";
                "taxonomy";
        }
       
        "erosional / depositional surface" -&gt; "weathering rate"      ;
        "parent material" -&gt; "landscape / landform age" -&gt; "soil texture";
        "parent material" -&gt; "weathering rate";
        "geology" -&gt; "parent material" -&gt; "soil texture";
       
        "landscape position" -&gt; "xy water routing";
        "local weather" -&gt; "solar radiation budget";
        "solar radiation budget" -&gt; "soil moisture status" -&gt; "VEG";
        "landscape position" -&gt; "aspect" -&gt; "solar radiation budget";
        "landscape position" -&gt; "Cs log(As)" -&gt; "erosional / depositional surface";
        "surrounding terrain" -&gt; "solar radiation budget";
        "surrounding terrain" -&gt; "landscape position";
        "surrounding terrain" -&gt; "precip" -&gt; "Cs log(As)";
        "Cs log(As)" -&gt; "xy water routing";
        "xy water routing" -&gt; "soil moisture status";
        "xy water routing" -&gt; "energy flux" [style = dotted];
        "VEG" -&gt; "energy flux";
        "soil moisture status" -&gt; "energy flux" [style = dotted];
        "energy flux" -&gt; "weathering rate";
        "soil texture" -&gt; "infiltration rate";
        "soil texture" -&gt; "weathering rate";
        "infiltration rate" -&gt; "weathering rate";
        "infiltration rate" -&gt; "leaching" -&gt; "vertical soil development";

        "weathering rate" -&gt; "vertical soil development";
        "landscape / landform age" -&gt; "vertical soil development" [style = dotted];
        "vertical soil development" -&gt; "soil properties";
        "vertical soil development" -&gt; "taxonomy" [style = dotted];
        "soil properties" -&gt; "taxonomy";
}

 
Example DOT Code:

# if you have a recent copy of graphviz:
# save directly to PDF
dot -Tpdf -o output.pdf input.dot

Graphviz Example OutputGraphviz Example Output

LaTeX Tips and Tricks

A collection of links to documents which I have found helpful in working with the LaTeX document preparation system. An internal link to all documents related to LaTeX can be accessed here.

 
Getting Started

 
CV Creation

 
Journal Submission

 
Collaborative Writing Ideas

Customizing BibTeX

 
Premise:

See attached files for a bibliography style (.bst) file that is compatible with the SSSAJ (Soil Science Society of America Journal) style guidelines.

 
General Notes:

  • when citing named chapters in a book, use the incollection entry type
  • when using makebst get the style as close as you can to the target, then tweak the resulting .bst file

 
Documentation:

http://www.andy-roberts.net/misc/latex/latextutorial3.html

 
A Custom Bib Style:

latex makebst

Latex to ODT conversion tips

 
Conversion of a .tex document to open document format (RTF)

latex2rtf file.tex

 
Conversion of a .tex document to open document format (ODT)

mk4ht oolatex file.tex

 
Some notes:

  1. disable hyperref package to prevent odd bug in citation listings
  2. input document should be latex-safe, i.e. eps figures
  3. need a working java installation
  4. commands like \onehalfspacing or begin{singlespace}...end{singlespace} may not work (setspace package)

 
Conversion of PDF, EPS, or Postscript figures to EMF (enhanced metafile format) for windows apps.

Note that these instructions are for unix-like systems. EMF support can be added to pstoedit via libEMF. The libEMF library contains several windows-specific coding practices, along with some sloppy use of include files. Instructions for compiling with a modern version of GCC (4.x) can be found here.

  1. Some related pages
    • http://www.physik.tu-cottbus.de/~george/unix/emf4unix.html
    • http://boshoff.za.net/linux/archives/2003_08.html
  2. download, hack, and compile the libEMF code (details)
  3. download, configure, and install pstoedit, note that you may have to manually define where the libEMF header and lib files are:
    ./configure --with-libemf-include=/usr/local/include/libEMF/ \
    --with-libemf-lib=/usr/local/lib/
  4. run ldconfig as root to refresh the library cache
  5. [Finally] Convert a PDF file into an OpenOffice-compatible EMF file (well sort of...)
    pstoedit -pta -f emf:"-OO -p -drawbb" file.pdf file.emf
  6. It seems like figures are clipped to smaller-than-expected bounding boxes unless the special flag -drawbb is used. One side-effect of this however is a box drawn around your figure. Also, the -pta flag is needed to ensure quality text placement. Even though the resulting EMF files may look odd on a Linux machine, they appear to work as expected in Windows.
  7. 

Making better looking tables with \ctable

 
Premise:
The standard Latex table environment can be difficult to extend, especially when one wants to use modern constructs such as table foot notes, etc. The ctable package is a convenient approach to solving this problem. A complete worked example of a table typset using the \ctable command is presented below. Note that \ctable is a command, and therefore does not allow blank newline characters. One way to maintain readability is to trick Latex by adding a comment character to all blank lines. Link to PDF manual for ctable.

Example of table produced with ctableExample of table produced with ctable

 
Code used to produce the example above. This example was produced with TexLive.

%
% start the table: note that we cannot have extra newline characters in the ctable defs
\ctable[
	cap     = {logistic regression parameters},
	% 
	caption = {Logistic regression model (M3) parameters. Coefficients, standard error, z-values, and p-values are included for each term used in the model. A separate slope and intercept term was fit to each geologic class.},
	% 
	label   = {aspect_effect:table:glm_model_properties},
	%
	%
]{lccc}{
	%
	\tnote[$\ast$]{Marginal p-values are used to determine whether each term is significantly different than 0.}
	%
}{ \FL
% 
Model Term & Value & Std. Error & Marginal p-values\tmark[$\ast$]      \ML
% 
Intercept & & &  \\  \cmidrule(r){1-1} 
% 
\;\;  andesite         &          1.146e+01   &   6.920e+00     &   0.09781  \\
\;\;  clastic\_volcanic   &        3.471e+00   &   2.234e+00    &       0.12024  \\
\;\;  coarse\_sedimentary  &       4.446e+00   &   2.390e+00    &    0.06285  \\
\;\;  felsic\_intrusive     &      5.384e+00   &   2.714e+00    &      0.04725  \\
\;\;  fine\_sedimentary   &        8.527e+00   &   4.013e+00      &   0.03361  \\
\;\;  rhyolite       &            1.075e+01   &   4.071e+00    &      0.00828 \\
\;\;  tuff              &         1.657e+01   &   9.116e+03    &      0.99855   \\
% 
% 
\\
Slope & & & \\   \cmidrule(r){1-1} 
% 
\;\;  andesite       &      -1.356e-03   &   8.448e-04     &  0.10858  \\ 
\;\;  clastic\_volcanic    &  -3.588e-04   &   2.851e-04     &   0.20825  \\
\;\;  coarse\_sedimentary  &  -8.165e-04   &   3.336e-04   &   0.01438  \\
\;\;  felsic\_intrusive    &   -6.352e-04    &  3.503e-04    &     0.06977  \\
\;\;  fine\_sedimentary    &   -1.122e-03   &   5.139e-04   &    0.02897  \\
\;\;  rhyolite       &       -1.361e-03  &    4.873e-04   &   0.00522  \\
\;\;  tuff         &         6.155e-14   &   1.200e+00   &   1.00000
% 
%  
\LL}

PostGIS: Spatially enabled Relational Database Sytem

PostGIS is a set of extensions to the relational database management system PostgreSQL, which provide access to spatial constructs, operators, and functions. In other words, PostGIS is a spatially-aware version of Postgresql. Paul Ramsey of Refractions Research has written up a nice summary of how were are using PostGIS.

 
Tuning Tips:

  1. Adjust kernel resources to allow larger memory allocation: (documentation)
  2. Increase the shared_buffers to a larger number (16000)
  3. See attached PDF at the bottom of this page for excellent ideas from Kevin Neufeld

 
PostGIS Syntax Examples:

  1. PostGIS Spatial SQL cheat-sheet
  2. Official Documentation
  3. PostGIS Wiki

 
PostGIS Presentations:

  1. Tips for the PostGIS Power User (Kevin Neufeld, Refractions Research)
  2. PostGIS Spatial Database: Introduction and Case Studies (Paul Ramsey)
  3. Transitioning Low Earth Orbit Satellite Archive Data from Informix (Geodetic DataBlade) to PostgreSQL (PostGIS) (Churngwei Chu, NASA/SSAI)
  4. Intro to PostgreSQL/PostGIS and MapServer (Arnulf Christ)

Importing and Exporting

 

Tips

  • Attribute data stored with a shapefile (the .dbf file) can only use column names shorter than 14 characters. If you try and export a postGIS table that has column names longer than 14 characters, they will be truncated when converted to a shapefile.
  • If a column in postGIS is defined as numeric, ogr2ogr will not correctly interpret the values in this column: i.e. floating point values less than 1 will be truncated to 0. In order to avoid this problem always define your colums as double precision or float. For dynamically generated tables, cast affected columns to a defined numeric precision: select colum_a::numeric(7,3) where '7' refers to the total number of digits in the column, and '3' refers to the number of decimal places.

 

Geometry and attribute data to GIS file format

 
GDAL/OGR tools This approach allows simultaneous conversion of coordinate systems, but is less flexible with respect to generation of new tables in PostGIS.

  • Import
     
    ogr2ogr -f "PostgreSQL" PG:'dbname=ssurgo_combined user=xxxx password=xxxx host=postgis.server.edu' input_file.shp
  • Export
     
    ogr2ogr output_file.shp PG:'dbname=ssurgo_combined user=xxxx password=xxxx host=postgis.server.edu' tablename

     
    Note that tables must be correctly 'registered' in the geometry_columns table for this to work:
    INSERT INTO geometry_columns VALUES ('','public','tablename','wkb_geometry',2,SRID,'geomtype');

 
PostGIS Loader/Dumper This approach is the simplest, but does not allow on-the-fly conversion of coordinate systems.

  • Import
     
    shp2pgsql -s SRID -c -g wkb_geometry -I shapefile.shp schema.table | psql -U username -h host database

     
    Note that SRID is the PostGIS 'spatial ref. sys. id' (see the spatial_ref_sys table). See the manual page for shp2pgsql for a complete list of arguments and their meanings.

  • Export
     
    pgsql2shp -f shapefile.shp -h host -u username -P password -k -g wkb_geometry database schema.table

     
    See the manual page for pgsql2shp for a complete list of arguments and their meanings.

 
Where tablename is your newly created table, SRID is the SRID (spatial reference ID) for the geometry in this table, and geomtype is the type of geometry: POINT, LINE, POLYGON, etc.

 

Attribute data to text format

 
CSV format, from within the psql client
\copy tablename TO 'filename.csv' CSV HEADER

 
CSV format, via psql client
echo "select column_list from table_list " | psql --tuples --no-align  -F ","  database > file.csv

 
Tabular data to HTML format, via psql client See output below:
echo "select column_list from table_list " | psql --html database > file.html

 
HTML output from psql

area compname
132472.230854819 Hilmar variant
322819.967391312 Oneil
362729.418301135 Carranza
431948.171760353 Tuff rockland
448784.927049035 Gravel pits
500763.225267798 Snelling variant
518860.954990617 Foster
571640.132661382 Alamo
648973.748756059 Toomes
924327.631201791 Dumps

(10 rows)

Example Spatial SQL Operations on Point, Line and Polygon Geometry Types

  • Points
    • Example
    • Example
    • Example
  • Lines
    • Example
    • Example
    • Example
  • Polygons
    • Example
    • Example
    • Example
  • Mixed
    • Example
    • Example
    • Example

Affine Transformation Operations in PostGIS

 
Overview
The ST_Affine() function from PostGIS is useful for manipulating geometries, but requires the elements of a transformation matrix. This page documents progress on automating the computation of the transformation matrix by least-squares (Bruce Rindahl) via SQL. This would allow a pure PostGIS solution to computing and applying affine transformations to geometry data.

 
An open-source algorithm for computing the transformation matrix
Example code from GRASS (v.transform) was used as a template.

 
Approach
Compute transformation matrix based on a table of control points, stored as numbers.

 
Evaluation of results
Comparable to output from a similar analysis done in R, and the original algorithm as implemented in v.transform (GRASS).

Case Study: Fixing Bad TIGER Line data with R and PostGIS

Example of bad Tiger data in Stanislaus County: Red lines are the original road network, green lines are the corrected road network.Example of bad Tiger data in Stanislaus County: Red lines are the original road network, green lines are the corrected road network.

 
The Problem
The US Census does a nice job of collecting all sorts of geographic and demographic information every 10 years. This data is available free of charge in the rather complex and soon to be replaced TIGER/LINE format. While this data covers the entire US down to the local road level, there are numerous errors and even extreme cases of coordinate-shift. Here is an example from Stanislaus County, California. The original TIGER data (red lines) are offset several hundred meters from the imagery. While it is not clear what may have caused the problem, it can be fixed without much effort using an affine transformation. We do not have the transformation matrix, however it can be 'fit' to a set of control points by several methods. The general form of the affine transform can be conveniently represented in homogeneous coordinates as:


Affine Matrix Form: new coordinates on the left-hand side, old coordinates on the right-hand sideAffine Matrix Form in homogeneous coordinates:New coordinates on the left-hand side, old coordinates on the right-hand side. The transformation matrix is the 3x3 matrix in the center.

 
The Solution

  • transform using GRASS and a set of control points with v.transform. This is the natural choice if the data is already in GRASS.
  • Using the same control points, compute the transformation matrix in R, then apply in PostGIS with the Affine() function. This is the ideal approach for our scenario. Note that the format of the 'fitted' transformation matrix returned by R is in a slightly different format:


    Transformation Matrix from RTransformation Matrix from R

We first need a set of control points, good and bad coordinates. This can be accomplished in several ways, we used the d.where command in GRASS:

# load road subset:
v.in.ogr dsn=PG:'dbname=tiger2005se user=xxx password=xxx host=xxx' layer=stan_roads out=s
#
# pick up some control points: note that they will be interleaved with this approach
# odd line numbers are "bad" coordinates
# even line numbers are "good" coordinates
d.where > stan.c_points
#
# we can separate the "bad" and "good" points in R...

Computing the transformation matrix can be done with a simple regression between 'good' and 'bad' coordinates in R. Note that this approach was suggested by Prof. Brian Ripley on the R-help mailing list.

 
Compute the Affine Transformation Matrix in R

## read in control points
## the good and bad points are interleaved
## odd numbers are 'bad' points
## s <- read.table('stan.c_points')
&nbsp;
## s.bad <- s[seq(from=1, to=nrow(s), by=2), ]
## s.good <- s[seq(from=2, to=nrow(s), by=2), ]
##
## make a composite dataframe
## x,y = bad data
## nx,ny = good data
## c <- data.frame(x=s.bad$V1, y=s.bad$V2, nx=s.good$V1, ny=s.good$V2)
##
## save a local copy in the format that v.transform uses
## write.table(c, file='grass_control_pts.txt', row.names=F, col.names=F)
##
&nbsp;
## from now on just use the simplified format:
c <- read.table('grass_control_pts.txt')
names(c) <- c('x','y','nx','ny')

&nbsp;
## compute transformation matrix, and check model fit:
l <- lm(cbind(nx,ny) ~ x + y, data=c)
##
## see at bottom of page -->
summary(l)

&nbsp;
## convert to affine transform matrix to the form needed by PostGIS:
## transpose the regression coefficient matrix:
t(coef(l))
 (Intercept)           x          y
nx    5017.082  1.00231639 0.00918962
ny  -28138.395 -0.01352029 0.99740077


&nbsp;
## check graphically:
x <- seq(-2080000, -2040000, by=1000)
y <- seq(-20000, 5000, by=2000)
d <- expand.grid(x=x, y=y)
p <- predict(l, d)

&nbsp;
par(mfcol=c(1,2), pty='s')
plot(y ~ x, data=c, main='Control Points', cex=.4, pch=16)
points(ny ~ nx, data=c, col='red', cex=.4, pch=16)
## arrows(c$x, c$y, c$nx, c$ny, len=0.05, col='gray')

&nbsp;
plot(d, cex=0.2, main='Shifted Grid')
## arrows(d$x, d$y, p[,1], p[,2], len=0.05, col='gray')
points(p, col='red', cex=.2)

&nbsp;
## sample transformation
print(predict(l, data.frame(x=-2078417.35893968, y=-14810.57043808)), digits=10)
            nx           ny
 -2078350.804 -14809.67334

Establishing the transformation based on control points: Red points represent where the coordinates should be. Black points are the original and incorect coordinates.Establishing the transformation based on control points: Red points represent where the coordinates should be. Black points are the original and incorect coordinates.

 
Check Affine Transform Results in PostGIS

-- matrix format:

-- R:
-- | xoff a b |
-- | yoff d e |

-- postgis wants:
-- ST_Affine(geom, a, b, d, e, xoff, yoff)

-- as a query:
SELECT asText(
ST_Affine(
'POINT(-2078417.35893968 -14810.57043808)',
1.00231639, 0.00918962, -0.01352029, 0.99740077, 5017.082, -28138.395
)
) ;

                   astext                  
--------------------------------------------
 POINT(-2078350.80564006 -14809.6639251817)

-- This is the same as what R says!

 
Perform Affine Transformation in PostGIS

--
-- Step 1. Alter the geometry of the bad data in-place
-- note that we have a backup in 'stan_roads'
--
UPDATE roads SET wkb_geometry =  ST_Affine(wkb_geometry, 1.00231639, 0.00918962, -0.01352029, 0.99740077, 5017.082, -28138.395 ) WHERE module = 'TGR06099' ;

 
Regression Diagnostics

Response nx :

Call:
lm(formula = nx ~ x + y, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-207.088  -23.856    8.614   21.245  161.610 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) 5.017e+03  1.369e+03    3.666  0.00079 ***
x           1.002e+00  6.654e-04 1506.386  < 2e-16 ***
y           9.190e-03  9.419e-04    9.756 1.20e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 52.25 on 36 degrees of freedom
Multiple R-Squared:     1,      Adjusted R-squared:     1 
F-statistic: 1.322e+06 on 2 and 36 DF,  p-value: < 2.2e-16 


Response ny :

Call:
lm(formula = ny ~ x + y, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-39.835 -18.459  -4.556  15.311  94.226 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.814e+04  7.409e+02  -37.98   <2e-16 ***
x           -1.352e-02  3.602e-04  -37.54   <2e-16 ***
y            9.974e-01  5.099e-04 1956.22   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 28.28 on 36 degrees of freedom
Multiple R-Squared:     1,      Adjusted R-squared:     1 
F-statistic: 2.187e+06 on 2 and 36 DF,  p-value: < 2.2e-16

Comparision with output from v.transform

 
First the output from R:
Looking at the residuals from the regression model used to map bad coordinates (x,y) to good coordinates (nx,ny):

## we made the linear model object 'l' above
## extract as dataframe
## residuals are computed for x and y separately
r <- as.data.frame(resid(l))

&nbsp;
## compute vector product of the (x,y) residuals:
c$resid <- sqrt(r$nx^2 + r$ny^2)
print(c)

          x          y       nx         ny      resid
1  -2078417 -14810.570 -2078314 -14838.378  46.617600
2  -2078743 -16057.955 -2078636 -16081.790  62.041274
3  -2077261 -16435.348 -2077170 -16463.156  40.905132
4  -2076709 -14405.369 -2076606 -14433.177  29.406399
5  -2074179 -15830.248 -2074084 -15901.558  33.111981
6  -2073850 -15707.435 -2073763 -15798.554  37.362736
7  -2073450 -13873.171 -2073359 -13920.712  21.623235
8  -2072359 -15204.613 -2072276 -15323.138  38.997678
9  -2072545 -14402.596 -2072450 -14513.219  32.918889
10 -2072189 -16022.434 -2072098 -16129.106  33.834074
11 -2071991 -16856.058 -2071928 -16942.976   6.277554
12 -2068407 -12999.396 -2068296 -13133.170   6.579285
13 -2069870 -12613.813 -2069764 -12731.848   2.357631
14 -2067635 -13188.253 -2067517 -13337.765  11.604519
15 -2066931 -13377.110 -2066809 -13518.753  22.719625
16 -2067411 -15084.692 -2067313 -15190.924  41.907273
17 -2066795 -18714.093 -2066741 -18846.019  14.541358
18 -2066384 -17080.538 -2066299 -17212.464  26.717495
19 -2068634 -19742.339 -2068580 -19835.464  27.483654
20 -2053326 -16930.710 -2053276 -17226.351  65.746074
21 -2051797 -17321.500 -2051899 -17579.762 227.516944
22 -2068307   2826.921 -2068066   2638.276  12.587853
23 -2067543   2648.205 -2067328   2449.631  37.729747
24 -2067126   4276.510 -2066904   4081.246  46.774630
25 -2066748   4170.604 -2066527   4001.816  59.509843
26 -2066068   2292.295 -2065860   2094.699  46.681553
27 -2065337   2107.872 -2065126   1900.397  43.386956
28 -2064606   1913.570 -2064378   1692.922  26.460389
29 -2064199   3558.561 -2063961   3356.401  47.742696
30 -2037464   6512.455 -2037076   5864.398  50.762994
31 -2036722   6825.682 -2036338   6199.227  22.699467
32 -2036876   6366.642 -2036498   5742.888  22.120176
33 -2040225   7150.180 -2039706   6575.029 161.631199
34 -2041064   7144.779 -2040732   6569.629  26.657582
35 -2044702 -15548.033 -2044564 -16024.903  23.844817
36 -2043992 -15723.521 -2043824 -16223.282  48.063840
37 -2043790 -14907.119 -2043611 -15383.990  34.844851
38 -2040616 -14820.445 -2040453 -15349.974  21.196233
39 -2039595 -15081.427 -2039485 -15588.263  47.263287

 
The Root-Mean-Square-Error (RMSE) for the fitted transform (in meters) is:

apply( (fitted(l) - c[, c('nx', 'ny')])^2  , 2, function(x) sqrt(mean(x)) )
      nx       ny
50.19954 27.17333

&nbsp;
#converting to the vector sum of RMSE (as reported in v.transform):
sqrt(50.19954^2 + 27.17333^2)
57.08225

 
The output from v.transform on the same set of control points:

Transformation Matrix
| xoff a b |
| yoff d e |
-------------------------------------------
5301.399323 1.002469 0.009172 
-28155.882288 -0.013530 0.997547 
-------------------------------------------

 
full output including the residuals:

               CHECK MAP RESIDUALS
                Current Map                  New Map
 POINT      X coord    Y coord  |        X coord   Y coord    |      residuals

  1.   -2078417.36    -14810.57 |  -2078314.07      -14838.38 |        46.81
  2.   -2078743.11    -16057.95 |  -2078635.85      -16081.79 |        62.22
  3.   -2077261.34    -16435.35 |  -2077169.97      -16463.16 |        41.05
  4.   -2076709.16    -14405.37 |  -2076605.87      -14433.18 |        29.59
  5.   -2074178.76    -15830.25 |  -2074083.67      -15901.56 |        33.21
  6.   -2073849.93    -15707.44 |  -2073762.78      -15798.55 |        37.42
  7.   -2073449.80    -13873.17 |  -2073358.68      -13920.71 |        21.62
  8.   -2072358.86    -15204.61 |  -2072275.89      -15323.14 |        39.02
  9.   -2072544.55    -14402.60 |  -2072449.73      -14513.22 |        32.97
 10.   -2072188.97    -16022.43 |  -2072098.11      -16129.11 |        33.87
 11.   -2071991.43    -16856.06 |  -2071928.22      -16942.98 |         6.27
 12.   -2068406.55    -12999.40 |  -2068296.38      -13133.17 |         6.60
 13.   -2069870.19    -12613.81 |  -2069763.96      -12731.85 |         2.33
 14.   -2067635.38    -13188.25 |  -2067517.35      -13337.76 |        11.63
 15.   -2066931.10    -13377.11 |  -2066809.13      -13518.75 |        22.74
 16.   -2067411.11    -15084.69 |  -2067312.75      -15190.92 |        41.93
 17.   -2066795.16    -18714.09 |  -2066740.84      -18846.02 |        14.64
 18.   -2066383.87    -17080.54 |  -2066298.50      -17212.46 |        26.74
 19.   -2068634.37    -19742.34 |  -2068580.05      -19835.46 |        27.53
 20.   -2053326.48    -16930.71 |  -2053275.51      -17226.35 |        66.09
 21.   -2051797.30    -17321.50 |  -2051899.25      -17579.76 |       227.91
 22.   -2068307.24      2826.92 |  -2068065.64        2638.28 |        12.41
 23.   -2067542.73      2648.21 |  -2067327.61        2449.63 |        37.44
 24.   -2067125.72      4276.51 |  -2066903.98        4081.25 |        46.40
 25.   -2066748.43      4170.60 |  -2066526.69        4001.82 |        59.12
 26.   -2066067.79      2292.29 |  -2065860.31        2094.70 |        46.35
 27.   -2065336.69      2107.87 |  -2065125.92        1900.40 |        43.07
 28.   -2064605.58      1913.57 |  -2064378.35        1692.92 |        26.16
 29.   -2064199.15      3558.56 |  -2063961.13        3356.40 |        47.43
 30.   -2037464.39      6512.45 |  -2037075.56        5864.40 |        50.66
 31.   -2036721.82      6825.68 |  -2036338.39        6199.23 |        22.54
 32.   -2036875.74      6366.64 |  -2036497.71        5742.89 |        21.95
 33.   -2040224.67      7150.18 |  -2039706.23        6575.03 |       161.54
 34.   -2041064.45      7144.78 |  -2040732.32        6569.63 |        26.74
 35.   -2044701.68    -15548.03 |  -2044564.34      -16024.90 |        23.64
 36.   -2043992.10    -15723.52 |  -2043824.24      -16223.28 |        47.60
 37.   -2043789.90    -14907.12 |  -2043610.60      -15383.99 |        34.35
 38.   -2040615.94    -14820.44 |  -2040453.30      -15349.97 |        20.77
 39.   -2039594.70    -15081.43 |  -2039485.02      -15588.26 |        47.85


  Number of points: 39
  Residual mean average: 57.082951

Higher Order Transformations

 
... Continuing from the initial example session in R ...

An affine transformation is based on a linear mapping between two coordinate-spaces. Testing for non-linearity (i.e. higher order model terms) can be a useful diagnostic in choosing the simpler affine transform.

 
Compute the difference between the good and bad coordinates

## compute vector difference and save back to original data frame:
c$sqdiff <- with(c, sqrt((nx-x)^2 + (ny-y)^2))

 
Generate two linear models:

## generate simple model
l.simple <- lm(sqdiff ~ nx + ny, data=c)
## model with higher order terms (up to 3rd-order):
## note to self: poly() computes orthagonal polynomials
l.poly <- lm(sqdiff ~ poly(nx, 3) + poly(ny, 3), data=c)

&nbsp;
## summarize the complex model:
summary(l.poly)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    302.38       3.97  76.174  < 2e-16 ***

poly(nx, 3)1  1091.17      35.82  30.466  < 2e-16 ***
poly(nx, 3)2   165.59      32.78   5.051 1.71e-05 ***
poly(nx, 3)3   -51.21      28.74  -1.782   0.0843 .  

poly(ny, 3)1   417.35      31.97  13.056 2.30e-14 ***
poly(ny, 3)2   -18.50      30.04  -0.616   0.5425    
poly(ny, 3)3    19.41      35.49   0.547   0.5882    

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

 
Is one model significantly better than the other?

## check via RMSE:
## simple model
sqrt( mean((fitted(l.simple) - c$sqdiff)^2) )
36.54747

&nbsp;
## model with 3rd-order polynomial
sqrt( mean((fitted(l.poly) - c$sqdiff)^2) )
22.45530

&nbsp;
## the anova function can compare nested models:
anova(l.simple, l.poly)

Analysis of Variance Table

Model 1: sqdiff ~ nx + ny
Model 2: sqdiff ~ poly(nx, 3) + poly(ny, 3)
  Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
1     36 52093                                  
2     32 19665  4     32428 13.192 1.865e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 
Check visually:

## re-make grid of points to predict on:
## and save two versions to 'stack-up'
x <- seq(-2080000, -2040000, by=1000)
y <- seq(-20000, 5000, by=2000)
d.simple <- expand.grid(x=x, y=y)
d.poly <- expand.grid(x=x, y=y)

&nbsp;
## and predict the trend in differences:
d.simple$diff <- predict(l.simple, data.frame(nx=d$x, ny=d$y))
d.poly$diff <- predict(l.poly, data.frame(nx=d$x, ny=d$y))

&nbsp;
## stack for grouped plotting
library(lattice)
library(RColorBrewer)
d.combined <- make.groups(simple=d.simple, poly=d.poly)

&nbsp;
## plot:
wireframe(diff ~ x * y | which, data=d.combined, scales=list(arrows=TRUE\), drape=TRUE\, col.regions=brewer.pal(n=9, name='Oranges'), cuts=8, col='white')

Testing for linearity: Two visualizations of the deviance between coordinates positions, at the control point locations.Testing for linearity: Two visualizations of the deviance between coordinates positions, at the control point locations.

 
Conclusions:
It seems that a second order term was only warranted along the x-direction. The more complex model based on 3rd-order polynomials results in a significantly lower RMSE (about 10 meters lower), and is shown to be a better descriptor of variance in the test of nested models.

At the map scale in which the corrected data will be presented, the extra accuracy suggested by the more complex model (coordinate transformation function) is negligible. This allows for the simpler model, which can be directly used by the convenient ST_Affine() function in PostGIS for the heavy-lifting.

Transformation parameters query

The following is a proof of concept query showing how a PostgreSQL query could give the transformation parameters required for an affine transformation. The source of the procedure is from GRASS at transform.c.

Note that the v.transform code expects the input file (control points) to be in the form ax ay bx by, where 'a' is the starting coordinate system (the bad coordinates in the previous example) and 'b' is the target system (the good coordinates). The

 
SQL version is as follows:

 
Sample Session

  1. CREATE TABLE link (
        gid integer NOT NULL,
        a_x double precision,
        a_y double precision,
        b_x double precision,
        b_y double precision
    );
    -- load data into table
    -- see attached file at bottom of page
    -- possibly use the file 'grass_control_pts.csv' ???
    COPY link FROM 'control_points.txt' ;
  2. CREATE TEMPORARY TABLE cc AS
    SELECT
    (SELECT count(b_x) FROM link) AS cc00,
    (SELECT sum(b_x) FROM link) AS cc01,
    (SELECT sum(b_y) FROM link) AS cc02,
    sum(l1.b_x * l2.b_x) AS cc11,
    sum(l1.b_x * l2.b_y) AS cc12,
    sum(l1.b_y * l2.b_y) AS cc22
     FROM link l1 , link l2 WHERE l1.gid = l2.gid;
  3. CREATE TEMPORARY TABLE aa AS
    SELECT
    (SELECT sum(a_y) FROM link) AS aa0,
    sum(l1.a_y * l2.b_x) AS aa1,
    sum(l1.a_y * l2.b_y) AS aa2
     FROM link l1 , link l2 WHERE l1.gid = l2.gid;
  4. CREATE TEMPORARY TABLE bb AS
    SELECT
    (SELECT sum(a_x) FROM link) AS bb0,
    sum(l1.a_x * l2.b_x) AS bb1,
    sum(l1.a_x * l2.b_y) AS bb2
     FROM link l1 , link l2 WHERE l1.gid = l2.gid;
  5. CREATE TEMPORARY TABLE det AS
    SELECT
    cc00*cc11*cc22 +
    cc01*cc12*cc02 +
    cc02*cc01*cc12 -
    cc00*cc12*cc12 -
    cc01*cc01*cc22 -
    cc02*cc11*cc02 AS det FROM cc;
  6. CREATE TEMPORARY TABLE inv_cc AS
    SELECT
    (cc11*cc22-cc12*cc12)/det AS cc00,
    (cc12*cc02-cc01*cc22)/det AS cc01,
    (cc01*cc12-cc11*cc02)/det AS cc02,
    (cc00*cc22-cc02*cc02)/det AS cc11,
    (cc01*cc02-cc00*cc12)/det AS cc12,
    (cc00*cc11-cc01*cc01)/det AS cc22
    FROM cc,det;
  7. SELECT
    (aa0*cc00+aa1*cc01+aa2*cc02) AS b0,
    (aa0*cc01+aa1*cc11+aa2*cc12) AS b1,
    (aa0*cc02+aa1*cc12+aa2*cc22) AS b2,

    (bb0*cc00+bb1*cc01+bb2*cc02) AS b3,
    (bb0*cc01+bb1*cc11+bb2*cc12) AS b4,
    (bb0*cc02+bb1*cc12+bb2*cc22) AS b5
    FROM aa,bb,inv_cc;

This query requires a table called link with the following fields - gid (primary key), a_x, a_y , b_x, b_y. The 'a' values are the 'from' coordinates and the 'b' values are the 'to' coordinates. Using the attached control points the result of this query is:

        b0        |         b1          |        b2         |        b3        |        b4        |         b5          
------------------+---------------------+-------------------+------------------+------------------+---------------------
 -28138.394850347 | -0.0135202854235867 | 0.997400773420259 | 5017.08164289594 | 1.00231638907948 | 0.00918961946271679

 
These are the exact results from 'R'

User defined Function in PL/pgSQL to compute the transformation parameters

Based on the results of the proof of concept example developed previously, a single function was developed in the procedural language for the PostgreSQL database system called PL/pgSQL. The only input parameter to this procedure is a text string that results in a table in the following format:

gid (integer), from_x (float), from_y (float), to_x (float), to_y (float)

Note the table must have the above fields but they can be in any order and can have additional fields. The gid field must be unique for each record.

The use of a SQL query found to be the simplest way to avoid the difficulties in programing the procedure without the need for temporary tables. An added benefit is the control point data can be in almost any format as long as it can be arranged in the format specified above. For example if the "from" points are in a geometry column (the_geom) in a table called from_pts and the corresponding "to" points are in a similar table called to_pts with a common attribute called "link_id", then the query would be:

SELECT from_pts.link_id AS gid, x(from_pts.the_geom) AS from_x, y(from_pts.the_geom) AS from_y, x(to_pts.the_geom) AS to_x, y(to_pts.the_geom) AS to_y FROM from_pts, to_pts WHERE from_pts.link_id = to_pts.link_id

 
Other table layouts and queries are possible depending on the manner in which the control points are collected.

 
The following is the SQL code to add a new procedure called trans_param() into a PostGIS database:

-- Function: trans_param(IN sql text, OUT a double precision, OUT b double precision, OUT d double precision, OUT e double precision, OUT xoff double precision, OUT yoff double precision)

-- DROP FUNCTION trans_param(IN sql text, OUT a double precision, OUT b double precision, OUT d double precision, OUT e double precision, OUT xoff double precision, OUT yoff double precision);

CREATE OR REPLACE FUNCTION trans_param(IN sql text, OUT a double precision, OUT b double precision, OUT d double precision, OUT e double precision, OUT xoff double precision, OUT yoff double precision) AS
$BODY$
DECLARE
        cc_row record;

        cc_det float;

        inv_cc00 float;
        inv_cc01 float;
        inv_cc02 float;
        inv_cc11 float;
        inv_cc12 float;
        inv_cc22 float;

BEGIN
        EXECUTE 'SELECT
        count(a.to_x) as cc00,
        sum(a.to_x) as cc01,
        sum(a.to_y) as cc02,
        sum(a.to_x * b.to_x) as cc11,
        sum(a.to_x * b.to_y) as cc12,
        sum(a.to_y * b.to_y) as cc22,
        sum(a.from_y) as aa0,
        sum(a.from_y * b.to_x) as aa1,
        sum(a.from_y * b.to_y) as aa2,
        sum(a.from_x) as bb0,
        sum(a.from_x * b.to_x) as bb1,
        sum(a.from_x * b.to_y) as bb2
        from ('
|| sql || ') a,(' || sql|| ') b WHERE a.gid = b.gid' INTO cc_row;


        SELECT INTO cc_det
                cc_row.cc00*cc_row.cc11*cc_row.cc22 +
                cc_row.cc01*cc_row.cc12*cc_row.cc02 +
                cc_row.cc02*cc_row.cc01*cc_row.cc12 -
                cc_row.cc00*cc_row.cc12*cc_row.cc12 -
                cc_row.cc01*cc_row.cc01*cc_row.cc22 -
                cc_row.cc02*cc_row.cc11*cc_row.cc02;

        SELECT INTO inv_cc00 (cc_row.cc11*cc_row.cc22-cc_row.cc12*cc_row.cc12)/cc_det;
        SELECT INTO inv_cc01 (cc_row.cc12*cc_row.cc02-cc_row.cc01*cc_row.cc22)/cc_det;
        SELECT INTO inv_cc02 (cc_row.cc01*cc_row.cc12-cc_row.cc11*cc_row.cc02)/cc_det;
        SELECT INTO inv_cc11 (cc_row.cc00*cc_row.cc22-cc_row.cc02*cc_row.cc02)/cc_det;
        SELECT INTO inv_cc12 (cc_row.cc01*cc_row.cc02-cc_row.cc00*cc_row.cc12)/cc_det;
        SELECT INTO inv_cc22 (cc_row.cc00*cc_row.cc11-cc_row.cc01*cc_row.cc01)/cc_det;

        SELECT INTO a cc_row.bb0*inv_cc01+cc_row.bb1*inv_cc11+cc_row.bb2*inv_cc12;
        SELECT INTO b cc_row.bb0*inv_cc02+cc_row.bb1*inv_cc12+cc_row.bb2*inv_cc22;
        SELECT INTO d cc_row.aa0*inv_cc01+cc_row.aa1*inv_cc11+cc_row.aa2*inv_cc12;
        SELECT INTO e cc_row.aa0*inv_cc02+cc_row.aa1*inv_cc12+cc_row.aa2*inv_cc22;
        SELECT INTO xoff cc_row.bb0*inv_cc00+cc_row.bb1*inv_cc01+cc_row.bb2*inv_cc02;
        SELECT INTO yoff cc_row.aa0*inv_cc00+cc_row.aa1*inv_cc01+cc_row.aa2*inv_cc02;

END;

$BODY$
  LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION trans_param(IN sql text, OUT a double precision, OUT b double precision, OUT d double precision, OUT e double precision, OUT xoff double precision, OUT yoff double precision) OWNER TO postgres;

 
Currently there is no error checking in the code if the determinant is zero.

 
To use the procedure simply use: SELECT trans_param('my SQL text')

 
The data from this series of articles is stored in a table called link. The id of the points is gid and the "from" values are b_x and b_y. The "to" values are a_x and a_y. Thus the query is:

SELECT gid,b_x AS from_x, b_y AS from_y, a_x AS to_x, a_y AS to_y FROM link

 
The result of the procedure is:

SELECT * FROM trans_param(
'SELECT gid,b_x as from_x, b_y as from_y, a_x as to_x, a_y as to_y from link')

0.997546509279282;-0.00917177514909895;0.0135300872142122;1.00246938174737;-5301.39933295548;28155.8822879205

 
Which matches the results returned from GRASS and R. Additional queries will be developed to give a table of residuals, the RMS error and the actual transformation of geometry.

Control Points Table Format

This is a start for discussions to create a series of function to perform an affine transformation of a PostGIS data set using a table of control points.

The first step is what is the format or layout of the control points?

I don't think point geometries is necessarily a good idea. The points must have an exact 1 to 1 relationship. To assure this you either have to maintain absolute integrity on the keys between two tables or have two geometries in one table. Both would be a hassle. An interleaved table format also would give me problems because the queries get really difficult and what if one version used "good" points in the odd rows and the next one put them in the even ones? In addition, if you use a seguential id and add one row "good" then add a "bad" one, then realize the bad one is really bad and delete it then the gid's will not be odd/even anymore in relation to "good" vs "bad". I think ESRI has the right idea here (God did I actually say that?). Look at the link table interface when you are georeferencing an image. It gives the id, XSource, YScource, Xmap, YMap, and the residual error. This could easily be done with the code - just one more query.

Given a table in this format a query could be done to give the RMS error. Also the table could be returned with the residual errors. When the user is happy then the table and the geometry could be input to transform the geometry.

Bruce

Analysis of SSURGO Data in PostGIS: An Overview

 
Note
Several of the pages documenting soil survey related queries have been taken down for a couple of days while I update the syntax.

 
News
It looks like the NCSS has constructed a web-based, SQL interface to their main database. This new tool was recently highlighted in issue 40 of the NCSS Newsletter and looks like a promising new tool with good documentation. A listing "web services" offered by the NCSS is listed on this page.

 
Overview
The analysis of SSURGO data is complicated by a series of hierarchical, one-to-many tabular relationships between spatial and attribute data. Understanding the structure of the SSURGO database is critical for correct interpretation and aggregation of soil properties. Before undertaking any SSURGO-based analysis please take some time to become familliar with the details on the product, including intended uses, minimum mapping units sizes, and other important details. In addition, becoming familliar with the SSURGO metadata is a must. The metadata page includes detailed descriptions of table structure, column names and units, as well as important information on the sources of much of the tabular data included in a SSURGO database.

SSURGO data can be downloaded by survey area from The Soil Datamart, with spatial data delivered in shapefile format and attribute data delivered as plain text. Unfortunately, an M.S. Access database template is required to utilize SSURGO attribute data as delivered from the Soil Datamart. For assist