Submitted by dylan on Thu, 2009-01-29 18:23.
When the relationship between two variable is (potentially) dependent on a third, categorical variable ANCOVA (analysis of covariance), or some variant, is commonly used. There are several approaches to testing for differences in slope/intercepts (in the case of a simple linear model) between levels of the stratifying variable. In R the following formula notation is usually used to test for interaction between levels of a factor (f) and the relationship between two continuous variables x and y: y ~ x * f. A simple graphical exploration of this type of model can be done through examination of confidence intervals computed for slope and intercept terms, for each level of our grouping factor (f). An example of a fictitious dataset is presented below. Note that this a rough approximation for testing differences in slope/intercept within a multi-level model. A more robust approach would take into account that we are trying to make several pair-wise comparisons, i.e. something akin to Tukey's HSD. Something like this can be done with the multcomp package. For any real data set you should always consult a real statistician.
Example Multi-Level Model: each panel represents a model fit to y ~ x, for group f
Example Multi-Level Data
# need this for xyplot()
# make some fake data:
x <- rnorm(100, mean=3, sd=6)
y <- x * runif(100, min=1, max=7) + runif(100, min=1.8, max=5)
d <- data.frame(x, y, f=rep(letters[1:10], each=10))
# check it out
xyplot(y ~ x | f, data=d, type=c('p','r'))
Submitted by dylan on Thu, 2009-01-15 04:36.
Wanted to make something akin to an interpolated surface for some spatially auto-correlated categorical data (presence/absence). I quickly generated some fake spatially auto-correlated data to work with using r.surf.fractal in GRASS. These data were converted into a binary map using an arbitrary threshold that looked about right-- splitting the data into something that looked 'spatially clumpy'.
Fig. 1: Simulated auto-correlated, categorical variable, with sampling points and derived voronoi polygons.
I had used voronoi polygons in the past to display connectivity of categorical data recorded at points, even though sparsely sampled areas tend to be over emphasized. Figure 1 shows the fake spatially auto-correlated data (grey = presence /white = not present), sample points (yellow boxes), and voronoi polygons. The polygons with thicker, red boundaries represent the "voronoi interpolation" of the categorical feature.
Submitted by dylan on Fri, 2008-12-26 21:26.
Thought it would be fun to compute how much ink a given poster requires, per unit area of paper, when sending to the department large-format printer. The Python Imaging Library provides several modules suitable for low-level operation on image data. A simple (and probably very inefficient) script was developed to compute the white/black percentage of an image. A script like this could be used to adjust a per-poster "ink cost", which would hopefully prevent people from wasting ink. Obviously, this computation is scale-dependent, so standardized rasterization parameters would have to be set in order for the "ink cost" calculation to be fair. More generalized or efficient approaches are always welcomed.
Submitted by dylan on Tue, 2008-12-23 01:46.
Updated version of a previous attempt to use a traveling-salesman approach to visiting data-loggers in steep terrain. This time we use a vector/raster approach (v.rast.stats) to compute the total "cost" incurred along each segment of a network connecting all of our data-loggers. Although this approach requires more computation time (v.rast.stats is slow), we can generate a more detailed cost surface to model movement. In this case we are using a cumulative-traversed-slope derived cost, with a 90% reduction along a local road.
Submitted by dylan on Sat, 2008-12-20 06:44.
Optimal Route in 3D
Several data-loggers in steep terrain; how can we visit all of them in the most efficient manner such that we traverse the least cumulative slope gradient? In a sense this is a traveling salesman style problem, with travel costs defined by slope gradient. The v.net.salesman module in GRASS can be used to compute the optimal route between data-loggers, given: 1) a network of lines connecting all points to be visited and 2) transferal of slope cost information to the network. An example approach is listed below. Note that due to a bug in v.net.salesman, we need to use one extra step-- details below.
Submitted by dylan on Mon, 2008-10-27 16:08.
Decagon Sensors: EC-5 (moisture) and ECT (temperature)
Simple demonstration of working with time-series data collected from Decagon Devices soil moisture and temperature sensors. These sensors were installed in a potted plant, that was semi-regularly watered, and data were collected for about 80 days on an hourly basis. Several basic operations in R are demonstrated:
- reading raw data in CSV format
- converting date-time values to R's date-time format
- applying a calibration curve to raw sensor values
- initialization of R time series objects
- seasonal decomposition of additive time series (trend extraction)
- plotting of decomposed time series, ACF, and cross-ACF
Submitted by dylan on Fri, 2008-09-26 17:19.
I have always had a hard time thinking about various parameters associated with random fields and empirical semi-variograms. The gstat package for R has an interesting interface for simulating random fields, based on a semi-variogram model. It is possible to quickly visualize the effect of altering semi-variogram parameters, by "seeding" the random number generator with the same value at each iteration. Of primary interest were visualization of principal axis of anisotropy, semi-variogram sill, and semi-variogram range. The code used to produce the images is included below. For more information on the R implementation of gstat, see the R-sig-GEO mailing list.
Submitted by dylan on Fri, 2008-08-15 04:56.
Setting up sampling designs is a non-trivial aspect to any field experiment that includes a spatial component. The sp package for R provides a simple framework for generating point sampling schemes based on region-defining features (lines or polygons) and a sampling type (regular spacing, non-aligned, random, random-stratified, hexagonal grid, etc.). The rgdal package provides a set of functions for importing/exporting common vector data formats. This example demonstrates simple import/export, iterating over sp objects, and reconstituting new objects from lists of objects. A more complex sampling scheme is demonstrated here.
Submitted by dylan on Wed, 2008-07-30 01:51.
It was necessary (for the purposes of this exercise) to generate some grouped data worthy of a creative panel function. An experiment was designed to test the coordination of 4 individuals (each a panel in the figure below), as a function of "clarity of mind" (symbol color in the figure below). The actual details of the experiment can be deduced from the attached data file, and careful inspection of the resulting plot. A similar experiment was conducted some time ago to demonstrate the Spatstat package in R.
A Customized Panel Function for Lattice Graphics -- "panel.bulls_eye()"
Lattice graphics are one of several possible visualization methods in available in R that are most useful when working with grouped data. Plots are generated via a formula interface, often in the format of y ~ x | f -- where y is the dependent variable, x is the independent variable, and f is a grouping factor. Have a look at the attached file (bottom of page) for an example of data in this format. Each panel in the plot is generated by a panel function, using a subset of the original data as defined by the grouping variable. In most situations the standard panel functions, such as panel.xyplot, are sufficient. However, when working with more "interesting" data, a customized panel function is the way to go.
In order to try the sample code out, you will need to:
- install the required packages
- copy and paste the panel.bulls_eye function source into an R session
- download the sample data file
- run the code listed in the sample R session
Since panel functions are made to be generic, any data source that is similar in nature to the sample can be directly plotted using this code-- i.e. if the experiment were to be repeated using 8 subjects instead of 4. Enjoy.
Submitted by dylan on Thu, 2008-04-17 00:55.
STATSGO KML thumbnail
SSURGO KML Thumbnail
A short update to a previous post on the visualization of NCSS/USDA soil survey data in Google Earth. The use of the NetworkLink construct, combined with the spatial indexing present in PostGIS, allows for very rapid lookup and presentation of this massive database. Scale-dependant switching between the detailed (SSURGO) and generalized (STATSGO) databases is done through simple area calculation in PostGIS.
Here is the link to the KMZ file. Here is a link to our conventional viewer application, based on Ka-Map / Mapserver, using the same PostGIS back-end (previous post on this). This PLSS KML file is very useful along-side soil survey information.
Feedback is always welcome!