Submitted by dylan on Mon, 2010-04-19 18:02.
Soil scientists routinely sample, characterize, and summarize patterns in soil properties in space, with depth, and through time. Invariably, some samples will be lost or sufficient funds required for complete characterization can run out. In these cases the scientist is left with a data table that contains holes (so to speak) in the rows/columns that are missing data. If the data are used within a regression, missing values in any of the predictor or the response variable result in row-wise deletion-- even if 9/10 variables are present. Furthermore, common multivariate methods (PCA, RDA, dissimilarity metrics, etc.) cannot effectively deal with missing data. The scientist is left with a couple options: 1) row-wise deletion of cases missing any variable, 2) re-sampling or re-characterizing the missing samples, or 3) estimating the missing values from other variables in the dataset. This last option is called missing data imputation. This is a broad topic with countless books and scientific papers written about it. Here is a fairly simple introduction to the topic of imputation. Fortunately for us non-experts, there is an excellent function (aregImpute()) in the Hmisc package for R.
Submitted by dylan on Fri, 2010-03-05 02:21.
GCS Model Grids
Recently finished some collaborative work with Vishal, related to visualizing climate change data for the SEI. This project was funded in part by the California Energy Commission, with additional technical support from the Google Earth Team. One of the final products was an interactive, multi-scale Google Earth application, based on PostGIS, PHP, and R. Interaction with the KMZ application results in several presentations of climate projections, fire risk projections, urban population growth projections, and other related information. Charts are dynamically generated from the PostGIS database, and returned to the web browser. In addition, an HTTP-based interface makes it simple to download CSV-formatted data directly from the CEC server. Some of our R code seemed like a good candidate for sharing, so I have posted a complete example below-- illustrating how to access climate projection data from the CEC server, a couple custom functions for fancy lattice graphics, and more.
Submitted by dylan on Thu, 2010-03-04 18:22.
another plyr example quantiles (0.05, 0.25, 0.5, 0.75, 0.95) of DSC by temperature bin
There are plenty of good examples on how to use functions from the plyr package. Here is one more, demonstrating how to use ddply with a custom function. Note that there are two places where the example function may blow up if you pass in poorly formatted or strange data: calls to 1) t.test() and 2) quantile(). Also note the use of the transpose function, t(), for converting column-wise data into row-wise data-- suitable for inclusion into a dataframe containing a single row.
Submitted by dylan on Tue, 2010-03-02 20:04.
Submitted by dylan on Fri, 2010-02-26 16:13.
Submitted by dylan on Tue, 2009-12-08 19:48.
I was recently asked to print out a fabric pattern that could be used to cover a sphere, about the size of a ping pong ball, for the purposes of re-creating a favorite cat toy (quite important). Thinking this over, I realized that this was basically a map projection problem-- and could probably be solved by scaling an interrupted sinusoidal projection to match the geometry of a ping pong ball. Below are some R functions, and examples of how this endeavor evolved. Thanks to Greg Snow for this helpful post on the R-mailing list, describing how to preserve linear measurement when composing a figure in R. So far the pattern doesn't quite fit.
It looks like it was not the printer's fault-- I had used the wrong radius for a ping pong ball: 16mm instead of 19mm or 20mm (there are 38mm and 40mm diameter ping pong balls). Updated files are attached.
Submitted by dylan on Thu, 2009-09-10 15:36.
SSURGO is a digital, high-resolution (1:24,000), soil survey database produced by the USDA-NRCS. It is one of the largest and most complete spatial databases in the world; and is available for nearly the entire USA at no cost. These data are distributed as a combination of geographic and text data, representing soil map units and their associated properties. Unfortunately the text files do not come with column headers, so a template is required to make sense of the data. Alternatively, one can use an MS Access template to attach column names, generate reports, and other such tasks. CSV file can be exported from the MS Access database for further use. A follow-up post with text file headers, and complete PostgreSQL database schema will contain details on implementing a SSURGO database without using MS Access.
If you happen to have some of the SSURGO tabular data that includes column names, the following R code may be of general interest for resolving the 1:many:many hierarchy of relationships required to make a thematic map.
This is the format we want the data to be in
mukey clay silt sand water_storage
458581 20.93750 20.832237 20.861842 14.460000
458584 43.11513 30.184868 26.700000 23.490000
458593 50.00000 27.900000 22.100000 22.800000
458595 34.04605 14.867763 11.776974 18.900000
So we can make a map like this
Submitted by dylan on Wed, 2009-05-27 18:43.
Western Fresno Soil Hierarchy: partial view of the hierarchy within the US Soil Taxonomic system
Field and lab characterization of soil profile data result in the accumulation of a massive, multivariate and three-dimensional data set. Classification is one approach to making sense of a large collection of this type of data. US Soil Taxonomy is the primary soil classification system used in the U.S.A and many other countries. This system is hierarchical in nature, and makes use on the presence or absence of diagnostic soil features. A comprehensive discussion of Soil Taxonomy is beyond the scope of this post. A detailed review of Soil Taxonomy can be found in Buol, S. W.; Graham, R. C.; McDaniel, P. A. & Southard, R. J. Soil Genesis and Classification Iowa State Press, 2003.
Submitted by dylan on Sat, 2009-05-23 21:32.
The current default database back-end used by the GRASS vector model is DBF (as of GRASS 6.5), however this is probably going to be changed (to SQLite) in GRASS 7. The DBF back-end works OK, however it tends to be very sensitive (i.e. breaks) when reserved words occur in column names or portions of a query. Complex UPDATE statements don't work, and just about anything more complex than a simple SELECT statement usually results in an error. Switching to the SQLite (or Postgresql, etc.) back-end solves most of these problems.
Currently GRASS uses a single SQLite (file-based) database per mapset-- convenient if you are interested in joining attribute tables between vectors; but not set-in-stone as the final approach that will be used by default in GRASS 7. Regardless, converting the back-end is a fairly simple matter. Finally, taking the time to convert to an SQLite or Postgresql back-end will undoubtably save you time and sanity if you ever find yourself working with vector+attribute data on a regular basis. Having access to a complete implementation of SQL can make extracting, summarizing, joining, and re-formatting (column names, types, etc.) tabular data much simpler than what is available in the DBF back-end. Also, there are several convenient graphical SQLite managers available, such as SQLite manager, SQLite data browser, and SQLite Admin.
Submitted by dylan on Mon, 2009-04-27 15:47.
CHNOSZ Chemical thermodynamics library and activity diagram software