Submitted by dylan on Thu, 2009-09-10 15:36.
SSURGO is a digital, high-resolution (1:24,000), soil survey database produced by the USDA-NRCS. It is one of the largest and most complete spatial databases in the world; and is available for nearly the entire USA at no cost. These data are distributed as a combination of geographic and text data, representing soil map units and their associated properties. Unfortunately the text files do not come with column headers, so a template is required to make sense of the data. Alternatively, one can use an MS Access template to attach column names, generate reports, and other such tasks. CSV file can be exported from the MS Access database for further use. A follow-up post with text file headers, and complete PostgreSQL database schema will contain details on implementing a SSURGO database without using MS Access.
If you happen to have some of the SSURGO tabular data that includes column names, the following R code may be of general interest for resolving the 1:many:many hierarchy of relationships required to make a thematic map.
This is the format we want the data to be in
mukey clay silt sand water_storage
458581 20.93750 20.832237 20.861842 14.460000
458584 43.11513 30.184868 26.700000 23.490000
458593 50.00000 27.900000 22.100000 22.800000
458595 34.04605 14.867763 11.776974 18.900000
So we can make a map like this
Submitted by dylan on Mon, 2009-08-17 15:19.
Rapid urban and sub-urban expansion in the San Joaquin Valley have resulted in the loss of millions of acres of prime farmland in the last 100 years. Approximately 11% of class 1 (irrigated) land and 7% of class 2 land have already been paved over in the Fresno-Madera region (first image below). Recent projections in the expansion of urban areas in this region suggests that by 2085 an additional 28% of class 1 and 25% of class 2 land will be paved over (second image below). This is a preliminary summary-- details to follow.
Fresno Area Urban Areas vs Irrigated LCC: grey regions are current urban areas
Submitted by dylan on Thu, 2009-07-09 15:26.
I was recently asked to verify the coefficients of a linear model fit to sets of data, where each row of the input file was a "site" and each column contained the dependent variable through time (i.e. column 1 = time step 1, column 2 = time step 2, etc.). This format is cumbersome in that it cannot be directly fed into the R lm() function for linear model fitting. Furthermore, we needed the output formatted with columns containing slope, intercept, and R-squared values for each site (rows). All of the re-formatting, and model fitting can be done by hand, using basic R functions, however this task seemed like a good case study for the use of the reshape and plyr packages for R. The reshape package can be used to convert between "wide" and "long" format-- the first step in the example presented below. The plyr package can be used to split a data set into subsets (based on a grouping factor), apply an arbitrary function to the subset, and finally return the combined results in several possible formats. The original input data, desired output, and R code are listed below.
Submitted by dylan on Sat, 2009-07-04 23:15.
My wife asked me to come up with some graph paper for creating smocking patterns. After a couple of minutes playing around with R-base graphics functions, it occurred to me that several functions in the sp package would simplify grid-based operations. Some example functions, along with a simple approach to generating "interesting" patterns, are listed below.
Submitted by dylan on Fri, 2009-07-03 18:34.
A colleague of mine recently asked about computing basic summary statistics from grouped data in R. These are a couple examples that I suggested. Additional documentation for the plyr package can be found here.
Submitted by dylan on Thu, 2009-05-28 17:46.
Soil Survey Data
Online Querying of NRCS Soil Survey Data
Sometimes you are only interested in soils data for a single map unit, component, or horizon. In these cases downloading the entire survey from Soil Data Mart is not worth the effort. An online query mechanism would suffice. The NRCS provides a form-based, interactive querying mechanism and a SOAP-based analogue. These services allow soil data lookup from the current snapshot of all data stored in NASIS.
Submitted by dylan on Wed, 2009-05-27 18:43.
Western Fresno Soil Hierarchy: partial view of the hierarchy within the US Soil Taxonomic system
Field and lab characterization of soil profile data result in the accumulation of a massive, multivariate and three-dimensional data set. Classification is one approach to making sense of a large collection of this type of data. US Soil Taxonomy is the primary soil classification system used in the U.S.A and many other countries. This system is hierarchical in nature, and makes use on the presence or absence of diagnostic soil features. A comprehensive discussion of Soil Taxonomy is beyond the scope of this post. A detailed review of Soil Taxonomy can be found in Buol, S. W.; Graham, R. C.; McDaniel, P. A. & Southard, R. J. Soil Genesis and Classification Iowa State Press, 2003.
Submitted by dylan on Sat, 2009-05-23 21:32.
The current default database back-end used by the GRASS vector model is DBF (as of GRASS 6.5), however this is probably going to be changed (to SQLite) in GRASS 7. The DBF back-end works OK, however it tends to be very sensitive (i.e. breaks) when reserved words occur in column names or portions of a query. Complex UPDATE statements don't work, and just about anything more complex than a simple SELECT statement usually results in an error. Switching to the SQLite (or Postgresql, etc.) back-end solves most of these problems.
Currently GRASS uses a single SQLite (file-based) database per mapset-- convenient if you are interested in joining attribute tables between vectors; but not set-in-stone as the final approach that will be used by default in GRASS 7. Regardless, converting the back-end is a fairly simple matter. Finally, taking the time to convert to an SQLite or Postgresql back-end will undoubtably save you time and sanity if you ever find yourself working with vector+attribute data on a regular basis. Having access to a complete implementation of SQL can make extracting, summarizing, joining, and re-formatting (column names, types, etc.) tabular data much simpler than what is available in the DBF back-end. Also, there are several convenient graphical SQLite managers available, such as SQLite manager, SQLite data browser, and SQLite Admin.
Submitted by dylan on Mon, 2009-04-27 15:47.
CHNOSZ Chemical thermodynamics library and activity diagram software
Submitted by dylan on Tue, 2009-02-17 04:43.
Small update to a similar thread from last week, on the comparison of slope and intercept terms fit to a multi-level model. I finally figured out (thanks R-Help mailing list!) how to efficiently use contrasts in R. The C() function can be called within a model formula, to reset the base level of an un-ordered factor. The UCLA Stats Library has an extensive description of this topic here. This approach can be used to sequentially test for differences between slope and intercept terms from a multi-level model, by re-setting the base level of a factor. See example data and figure below.
Note that the multcomp package has a much more robust approach to this type of operation. Details below.
Example Multi-Level Data
# need these
# replicate an important experimental dataset
x <- rnorm(100)
y1 <- x[1:25] * 2 + rnorm(25, mean=1)
y2 <- x[26:50] * 2.6 + rnorm(25, mean=1.5)
y3 <- x[51:75] * 2.9 + rnorm(25, mean=5)
y4 <- x[76:100] * 3.5 + rnorm(25, mean=5.5)
d <- data.frame(x=x, y=c(y1,y2,y3,y4), f=factor(rep(letters[1:4], each=25)))
xyplot(y ~ x, groups=f, data=d,
auto.key=list(columns=4, title='Beard Type', lines=TRUE, points=FALSE, cex=0.75),
type=c('p','r'), ylab='Number of Pirates', xlab='Distance from Land')
Example Multi-Level Model II