**Home**» Blog

# Blog

### Getting Parent Material Data out of SSURGO

#### Posted on May 28, 2010

Parent material data is stored within the `copm` and `copmgrp` tables. The `copm` table can be linked to the `copmgrp` table via the 'copmgrpkey' field, and the `copmgrp` table can be linked to the `component` table via the 'cokey' field. The following queries illustrate these table relationships, and show one possible strategy for extracting the parent material information associated with the largest component of each map unit.

Several of the example queries are based on this map unit:

### SQLite as an alternative to shapefiles, and some GPS fun in R

#### Posted on May 25, 2010

Finally made it out to Folsom Lake for a fine day of sailing and GPS track collecting. Once I was back in the lab, I downloaded the track data with gpsbabel, and was ready to import the data into GRASS.

```
# import GPX from GPS:
gpsbabel -t -i garmin -f /dev/ttyS0 -o gpx -F trip1.gpx
```

I was particularly interested in how fast we were able to get the boat going, however my GPS does not keep track of its speed in the track log. Also, I had forgotten to set GPS up for a constant time interval between track points. Dang. In order to compute a velocity between sequential points from the track log I would need to first do two things: 1) convert the geographic coordinates into projected coordinates, and 2) compute the associated time interval between points.

### Estimating Missing Data with aregImpute() {R}

#### Posted on April 19, 2010

Soil scientists routinely sample, characterize, and summarize patterns in soil properties in space, with depth, and through time. Invariably, some samples will be lost or sufficient funds required for complete characterization can run out. In these cases the scientist is left with a data table that contains *holes* (so to speak) in the rows/columns that are missing data. If the data are used within a regression, missing values in any of the predictor or the response variable result in row-wise deletion-- even if 9/10 variables are present. Furthermore, common multivariate methods (PCA, RDA, dissimilarity metrics, etc.) cannot effectively deal with missing data. The scientist is left with a couple options: 1) row-wise deletion of cases missing any variable, 2) re-sampling or re-characterizing the missing samples, or 3) estimating the missing values from other variables in the dataset. This last option is called missing data imputation. This is a broad topic with countless books and scientific papers written about it. Here is a fairly simple introduction to the topic of imputation. Fortunately for us non-experts, there is an excellent function (`aregImpute()`) in the Hmisc package for R.

### Converting Alpha-Shapes into SP Objects

#### Posted on April 19, 2010

Just read about a new R package called alphahull (paper) that sounds like it might be a good candidate for addressing this request regarding concave hulls. Below are some notes on computing alpha-shapes and alpha-hulls from spatial data and converting the results returned by `ashape()` and `ahull()` into SP-class objects. Note that the functions are attached at the bottom of the page. Be sure to read the license for the alphahull package if you plan to use it in your work.

Figure

### Accessing Climate Change Data and a Custom Panel Function for Filled Polygons

#### Posted on March 5, 2010

Recently finished some collaborative work with Vishal, related to visualizing climate change data for the SEI. This project was funded in part by the California Energy Commission, with additional technical support from the Google Earth Team. One of the final products was an interactive, multi-scale Google Earth application, based on PostGIS, PHP, and R. Interaction with the KMZ application results in several presentations of climate projections, fire risk projections, urban population growth projections, and other related information. Charts are dynamically generated from the PostGIS database, and returned to the web browser. In addition, an HTTP-based interface makes it simple to download CSV-formatted data directly from the CEC server. Some of our R code seemed like a good candidate for sharing, so I have posted a complete example below-- illustrating how to access climate projection data from the CEC server, a couple custom functions for fancy lattice graphics, and more.

### Yet Another plyr Example

#### Posted on March 4, 2010

Figure:

There are plenty of good examples on how to use functions from the plyr package. Here is one more, demonstrating how to use `ddply` with a custom function. Note that there are two places where the example function may blow up if you pass in poorly formatted or strange data: calls to 1) `t.test()` and 2) `quantile()`. Also note the use of the *transpose* function, `t()`, for converting column-wise data into row-wise data-- suitable for inclusion into a dataframe containing a single row.

### Numerical Integration/Differentiation in R: FTIR Spectra

#### Posted on February 23, 2010

Stumbled upon an excellent example of how to perform numerical integration in R. Below is an example of piece-wise linear and spline fits to FTIR data, and the resulting computed area under the curve. With a high density of points, it seems like the linear approximation is most efficient and sufficiently accurate. With very large sequences, it may be necessary to adjust the value passed to the `subdivisions` argument of `integrate()`. Strangely, larger values seem to solve problems encountered with large datasets...

FIgure:

### Visual Interpretation of Principal Coordinates (of) Neighbor Matrices (PCNM)

#### Posted on February 21, 2010

Principal Coordinates (of) Neighbor Matrices (PCNM) is an interesting algorithm, developed by P. Borcard and P. Legendre at the University of Montreal, for the multi-scale analysis of spatial structure. This algorithm is typically applied to a distance matrix, computed from the coordinates where some environmental data were collected. The resulting "PCNM vectors" are commonly used to describe variable degrees of *possible* spatial structure and its contribution to variability in other measured parameters (soil properties, species distribution, etc.)-- essentially a spectral decomposition spatial connectivity. This algorithm has been recently updated by and released as part of the PCNM package for R. Several other implementations of the algorithm exist, however this seems to be the most up-to-date.

**Related Presentations and Papers on PCNM**

- http://biol09.biol.umontreal.ca/ESA_SS/Borcard_&_PL_talk.pdf
- Borcard, D. and Legendre, P. 2002. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153: 51-68.
- Borcard, D., P. Legendre, Avois-Jacquet, C. & Tuomisto, H. 2004. Dissecting the spatial structures of ecologial data at all scales. Ecology 85(7): 1826-1832.

### Updates to SoilWeb

#### Posted on January 18, 2010

Figure:

Added color support to the mini-profiles used in graphical map unit summaries, the Google Earth interface, and iPhone application. SSURGO doesn't contain soil color data, so colors (in Munsell notation) were extracted from the OSD database, and converted into RGB triplets. Using horizon information from the OSD database also results in much more realistic horizonation, as compared to what is stored in older SSURGO databases. Example of the Yolo series soil, from the Yolo County (1972) soil survey:

### Interesting use of levelplot() for time series data

#### Posted on January 16, 2010

Figure:

Several recent articles appeared on the R-bloggers feed aggregator that demonstrated an interesting visualization of time series data using color. This style of visualization was readily adapted for the time series data I regularly collect (soil moisture and temperature), and quickly implemented with the `levelplot()` function from the lattice package. I hadn't previously considered using a mixture of factor (categorical) and continuous variables within a call to `levelplot()`, however the resulting figure was more useful than expected (see above). A single day's observation is represented by a colored strip (redder hues are higher temperature values, and lower soil moisture values), placed along the x-axis according to the date of that observation, and in a row defined by the location where that observation was collected from. Paneling of the data can be used to represent a more complex hierarchy, such as sensor depth or landscape position. At the expense of *quantitative data retrieval* (which is better supported be scatter plots), *qualitative patterns* are quickly identified within the new graphic.