Open Source GIS and Mapping Ideas

Extracting an image chunk from a collection of Large MrSid Images

Submitted by dylan on Mon, 2012-06-04 17:29.

Recently needed to extract a small "chunk" from a collection of adjacent MrSid mosaics, each about 4Gb in size. Once again, GDAL came to the rescue, and saved much time and agony wile working with very large, compressed, and proprietary-format files. Two lessons learned:

  1. The GDAL VRT format can save a lot of time and effort by providing access to a collection of files without actually altering the originals.
  2. ArcGIS 9.x does not like BigTIFF files. When file sizes approach or exceed 4Gb, the HFA format is a nice alternative.

Have patience, subsetting a chunk out of 5 adjacent MrSid files (4Gb each) took about 7 hours. Fun experiment: extract sub-chunks from each of the constituent sid files and distribute across CPU cores.

Soil Series Query for SoilWeb

Submitted by dylan on Fri, 2011-09-16 16:12.

A map depicting the spatial distribution of a given soil series can be very useful when working on a new soil survey, updating an old one, or searching for specific soil characteristics. We have recently added a soil series query facility to SoilWeb, where results are returned in the form of a KML file. Two modes are currently supported:

  1. map unit based querying- only map units named for the given soil series are returned
  2. component based querying- map units containing components named for the given series are returned

For example, if someone was interested in the spatial distribution of the Amador soil series, they could use the Series Extent Mapping tool to get a quick description of which survey areas contain (and how many corresponding acres of) this series. For an even more detailed description of where the Amador series is mapped, one could use our new soil series query like this:

This is a preliminary version, and a subsequent post will contain links to a Google Earth file that can be used to simplify the query process. In most cases queries take about 1-5 seconds, which is quite fast considering: 1) either 730k component names or 275k map unit names are searched, 2) 35 million map unit polygons are filtered for the series in question, and, 3) bounding boxes for matching polygons are merged together-- all on-the-fly. Full text searches for map unit/component names are very fast thanks to advanced text indexing and searching algorithms implemented in PostgreSQL and spatial processing functions implemented in PostGIS. In the final version, the location of the official series description (OSD) will be included in query results.

Attached at the bottom of the page is a KMZ demo showing sample output from the two query modes. Screen shots from the demo are posted below.

Soil Series Query Results: Amador Series: blue regions: map units dominated by the Amador series; red regions: map units that contain at least one component of the Amador series.Soil Series Query Results: Amador Series: blue regions: map units dominated by the Amador series; red regions: map units that contain at least one component of the Amador series.

Terrain Classification Experiment 2: GRASS, R, and the raster package

Submitted by dylan on Tue, 2011-05-24 16:37.

Quick post on terrain classification, based on some trouble folks were having with a previous example on Windows. With the spgrass6 package, raster stacks are created by loading several GRASS files at once: x <- readRAST6(vname=c('beam_sum_mj','ned10m_ccurv','ned10m_pcurv','ned10m_slope')). This works well on UNIX-like operating systems and in cases where the entire collection of raster maps can fit within the system memory. A different strategy is needed when working with massive raster stacks or on other (less useful) operating systems. This post outlines one possible strategy that should work on massive data sets, and across operating systems.


  1. export terrain surfaces from GRASS to intermediate files
  2. import into R with raster package
  3. perform unsupervised classification on a sample of the cells using PAM
  4. apply clustering to unsampled cells with randomForest
  5. save results to intermediate file
  6. import results into GRASS for post-processing

Terrain Classification ExampleTerrain Classification Example

First-Cut Approach to Synchronizing Field Notes with GPS Data

Submitted by dylan on Mon, 2011-05-09 22:41.

After a week's worth of work in the field, I typically have several pages of hand-written field notes that are associated with GPS waypoints-- badly in need of some kind of transcription/organization. I have yet to find a simple approach for bringing together these non-spatial (field note transcriptions) and spatial (waypoints) data, linked only by a simple ID system. Also, I am somewhat limited by the software available on USDA computer systems. Here is a first-cut approach, implemented in R, that relies on CSV files. In a perfect world, I would implement something using python and PostGIS...

Visualizing Terrain Surface Indicies with Scaled Arrows

Submitted by dylan on Sat, 2011-04-30 22:30.

Hamish Bowman recently posted a new GRASS module (d.barb) that can be used to depict the direction and magnitude components of some vector (e.g. wind) field along a raster surface or at points in space. An example (c/o Hamish):

Tips for Using our Google Earth Interface to SoilWeb

Submitted by dylan on Mon, 2011-02-28 23:40.

GE SoilWeb DemoGE SoilWeb Demo

The folks in Lincoln recently changed the links to the OSDs from http:// to https:// style URLs... which seems to cause problems for new versions of Google Earth. I have noticed that when clicking on soil profile sketches an error is returned (something about an SSL handshake problem). This can be fixed by asking Google Earth to use an external browser for displaying data. Here are some instructions:

Customizing Maps in R: spplot() and latticeExtra functions

Submitted by dylan on Wed, 2010-12-15 17:00.

I recently noticed the new latticeExtra page on R-forge, which contains many very interesting demos of new lattice-related functionality. There are strong opinions about the "best" graphics system in R (base graphics, grid graphics, lattice, ggplot, etc.)-- I tend to use base graphics for simple figures and lattice for depicting multivariate or structured data. The sp package defines classes for storing spatial data in R, and contains several useful plotting methods such as the lattice-based spplot(). This function, and back-end helper functions, provide a generalized framework for plotting many kinds of spatial data. However, sometimes with great abstraction comes great ambiguity-- many of the arguments that would otherwise allow fine tuning of the figure are buried in documentation for lattice functions. Examples are more fun than links to documentation, so I put together a couple of them below. They describe several strategies for placing and adjusting map legends-- either automatically, or manually added with the update() function. The last example demonstrates an approach for over-plotting 2 rasters. All of the examples are based on the meuse data set, from the gstat package.

Extended spplot() examplesExtended spplot() examples

Three New Soils-Related KMZ Demos

Submitted by dylan on Tue, 2010-12-07 18:25.

Soil Texture KMZSoil Texture KMZ

Updated versions of three soils-related KMZ files: 1-km scale, aggregate LCC, CA Storie Index, and soil texture data, derived from SSURGO. These are part of a series of KMZ / raster datasets that will be published soon. See attached files at the bottom of the page. Enjoy!