Dylan

Aggregating SSURGO Data in R

Submitted by dylan on Thu, 2009-09-10 15:36.

 
Premise
SSURGO is a digital, high-resolution (1:24,000), soil survey database produced by the USDA-NRCS. It is one of the largest and most complete spatial databases in the world; and is available for nearly the entire USA at no cost. These data are distributed as a combination of geographic and text data, representing soil map units and their associated properties. Unfortunately the text files do not come with column headers, so a template is required to make sense of the data. Alternatively, one can use an MS Access template to attach column names, generate reports, and other such tasks. CSV file can be exported from the MS Access database for further use. A follow-up post with text file headers, and complete PostgreSQL database schema will contain details on implementing a SSURGO database without using MS Access.

If you happen to have some of the SSURGO tabular data that includes column names, the following R code may be of general interest for resolving the 1:many:many hierarchy of relationships required to make a thematic map.

 
This is the format we want the data to be in

    mukey     clay      silt      sand water_storage
   458581 20.93750 20.832237 20.861842     14.460000
   458584 43.11513 30.184868 26.700000     23.490000
   458593 50.00000 27.900000 22.100000     22.800000
   458595 34.04605 14.867763 11.776974     18.900000

 
So we can make a map like this
So we can make a map like this

Making Sense of Large Piles of Soils Information: Soil Taxonomy

Submitted by dylan on Wed, 2009-05-27 18:43.

Western Fresno Soil Hierarchy: partial view of the hierarchy within the US Soil Taxonomic systemWestern Fresno Soil Hierarchy: partial view of the hierarchy within the US Soil Taxonomic system

 
Soil Data
Field and lab characterization of soil profile data result in the accumulation of a massive, multivariate and three-dimensional data set. Classification is one approach to making sense of a large collection of this type of data. US Soil Taxonomy is the primary soil classification system used in the U.S.A and many other countries. This system is hierarchical in nature, and makes use on the presence or absence of diagnostic soil features. A comprehensive discussion of Soil Taxonomy is beyond the scope of this post. A detailed review of Soil Taxonomy can be found in Buol, S. W.; Graham, R. C.; McDaniel, P. A. & Southard, R. J. Soil Genesis and Classification Iowa State Press, 2003.

Simple Approach to Converting GRASS DB-backends

Submitted by dylan on Sat, 2009-05-23 21:32.

 
Premise:
The current default database back-end used by the GRASS vector model is DBF (as of GRASS 6.5), however this is probably going to be changed (to SQLite) in GRASS 7. The DBF back-end works OK, however it tends to be very sensitive (i.e. breaks) when reserved words occur in column names or portions of a query. Complex UPDATE statements don't work, and just about anything more complex than a simple SELECT statement usually results in an error. Switching to the SQLite (or Postgresql, etc.) back-end solves most of these problems.

Currently GRASS uses a single SQLite (file-based) database per mapset-- convenient if you are interested in joining attribute tables between vectors; but not set-in-stone as the final approach that will be used by default in GRASS 7. Regardless, converting the back-end is a fairly simple matter. Finally, taking the time to convert to an SQLite or Postgresql back-end will undoubtably save you time and sanity if you ever find yourself working with vector+attribute data on a regular basis. Having access to a complete implementation of SQL can make extracting, summarizing, joining, and re-formatting (column names, types, etc.) tabular data much simpler than what is available in the DBF back-end. Also, there are several convenient graphical SQLite managers available, such as SQLite manager, SQLite data browser, and SQLite Admin.

Interesting R Packages

Submitted by dylan on Mon, 2009-04-27 15:47.

 
CHNOSZ Chemical thermodynamics library and activity diagram software

 

( categories: )

Checking Type Locations

Submitted by dylan on Mon, 2009-04-20 22:18.

 
Just Checking

-- NAD27 to NAD83 
echo 119d7\'4\"W 36d23\'13\"N | cs2cs +proj=latlong +datum=NAD27 +to +proj=latlong +datum=NAD83 -f "%.6f"
  

Comparison of Slope and Intercept Terms for Multi-Level Model II: Using Contrasts

Submitted by dylan on Tue, 2009-02-17 04:43.

Premise

Small update to a similar thread from last week, on the comparison of slope and intercept terms fit to a multi-level model. I finally figured out (thanks R-Help mailing list!) how to efficiently use contrasts in R. The C() function can be called within a model formula, to reset the base level of an un-ordered factor. The UCLA Stats Library has an extensive description of this topic here. This approach can be used to sequentially test for differences between slope and intercept terms from a multi-level model, by re-setting the base level of a factor. See example data and figure below.

Note that the multcomp package has a much more robust approach to this type of operation. Details below.

 
Example Multi-Level Data

# need these
library(lattice)
library(Design)

# replicate an important experimental dataset
set.seed(10101010)
x <- rnorm(100)
y1 <- x[1:25] * 2 + rnorm(25, mean=1)
y2 <- x[26:50] * 2.6 + rnorm(25, mean=1.5)
y3 <- x[51:75] * 2.9 + rnorm(25, mean=5)
y4 <- x[76:100] * 3.5 + rnorm(25, mean=5.5)
d <- data.frame(x=x, y=c(y1,y2,y3,y4), f=factor(rep(letters[1:4], each=25)))

# plot
xyplot(y ~ x, groups=f, data=d,
auto.key=list(columns=4, title='Beard Type', lines=TRUE, points=FALSE, cex=0.75),
type=c('p','r'), ylab='Number of Pirates', xlab='Distance from Land')

Example Multi-Level Model IIExample Multi-Level Model II

( categories: )

Aggregating Soil Survey Information: Available Water Holding Capacity

Submitted by dylan on Sat, 2009-01-31 23:53.

4km Grid of AWC: generated using PostGIS/GRASS, based on USDA-NCSS SSURGO data.4km Grid of AWC: generated using PostGIS/GRASS, based on USDA-NCSS SSURGO data.

Horizon thickness-weighted mean AWC (available water holding capacity), aggregated to a 4km grid, based on the detailed (SSURGO) soil survey database. Each grid cell is the component percentage / area fraction weighted mean of profile AWC. The variation in AWC tracks several important parent material induced patterns: with lower AWC in residual soils formed on steep granitic terrain (south flank of Sierra Nevada), and higher AWC in residual soils formed on the gentler slopes of meta-volcanic and meta-sedimentary terrain (central and northern flanks of Sierra Nevada). The higher AWC values one the east side of the San Joaquin Valley correspond with the characteristically finer soils formed from coast range alluvium. High AWC values of the Sacramento Valley correspond with the fine textured soils derived from a mixture of coast range alluvium, and meta-volcanic/sedimentary alluvium from the Sierra Nevada.

( categories: )

Scaling Soil Survey

Submitted by dylan on Sat, 2009-01-31 20:56.

Background and Justification

Comparison of Slope and Intercept Terms for Multi-Level Model

Submitted by dylan on Thu, 2009-01-29 18:23.

Premise

When the relationship between two variable is (potentially) dependent on a third, categorical variable ANCOVA (analysis of covariance), or some variant, is commonly used. There are several approaches to testing for differences in slope/intercepts (in the case of a simple linear model) between levels of the stratifying variable. In R the following formula notation is usually used to test for interaction between levels of a factor (f) and the relationship between two continuous variables x and y: y ~ x * f. A simple graphical exploration of this type of model can be done through examination of confidence intervals computed for slope and intercept terms, for each level of our grouping factor (f). An example of a fictitious dataset is presented below. Note that this a rough approximation for testing differences in slope/intercept within a multi-level model. A more robust approach would take into account that we are trying to make several pair-wise comparisons, i.e. something akin to Tukey's HSD. Something like this can be done with the multcomp package. For any real data set you should always consult a real statistician.

Example Multi-Level Model: each panel represents a model fit to y ~ x, for group fExample Multi-Level Model: each panel represents a model fit to y ~ x, for group f

 
Example Multi-Level Data

# need this for xyplot()
library(lattice)

# make some fake data:
x <- rnorm(100, mean=3, sd=6)
y <- x * runif(100, min=1, max=7) + runif(100, min=1.8, max=5)
d <- data.frame(x, y, f=rep(letters[1:10], each=10))

# check it out
xyplot(y ~ x | f, data=d, type=c('p','r'))

( categories: )