Raster aggregation ideas: GRASS and StarSpan [updated]

Submitted by dylan on Wed, 2006-11-08 18:56.

 
Background
Current correspondence in regards to the r.resample and r.bilinear modules of GRASS has prompted a simple experiment demonstrating the need for raster aggregation. Until recently (please correct me if I am wrong) raster resampling could only be done via r.resample or r.bilinear, and with a recent innovation by Glynn Clements, it looks like all three interpolation algorithms (nearest neighbor, bilinear, and bicubic) has been incorporated into a new module called r.resamp.interp. While this approach and naming convention makes sense for the change in raster support (grid-cell spacing) from large to small (i.e. 100 meters to 10 meters), it does not make sense for moving the other way. Moving the other way (i.e. 10 meters to 100 meters) would best be accomplished by an aggregation technique. For categorical data the mode within the new grid cell could be used.


The following simple experiment with a raster file based on categorical values illustrates how the current approach of changing raster support (small cell size to large) compares with an aggrigation technique based on the mode value within the new, larger cell size. The starting image is from an IKONOS composite of the channels green, red, and near-infrared created at a resolution of 4 meters. Here I will demonstrate two possible methods for aggregating this image to a cell size of 40 meters. Note that the methods involved are somewhat of a hack, and tighter integration of some components may be possible.

UPDATE: November, 2006. Glynn C. has kindly implemented raster aggregation in the new module r.resamp.stats, and raster re-sampling in the new module r.resamp.interp.


UPDATE November 7, 2006. Update: Roger Bivand has fixed a small error in the spgrass6 R package, so that raster cells with a value of '0' are not interpreted as NA (no data).


 
New Experiment:
Simple evaluation of the differences between IKONOS imagery aggregated from a grid cell spacing of 4 meters to 40 meters, by three methods:

  1. standard nearest-neighbor resampling algorithm in GRASS (this data is refered to as either: "ikonos.rgb", "i", "default NN")
  2. the new r.resamp.stats module in GRASS-CVS (this data is refered to as: "b_40")
  3. a raster-vector-raster approach using Starspan (this data is refered to as: "a_40")

The general approach to aggregating the IKONOS imagery can be summarized as: compute the statistical mode of all cells (at a 4 meter grid spacing) falling within a superimposed 40 meter grid (Figure 1). For reference, the region was adjusted to a resolution of 40 meters, and a copy of the original image was made. This copy (ikonos.rgb) represents the default NN algorithm applied to GRASS data when the region resolution does not match the native resolution of a raster.

 
Initial Conclusions

  • The GRASS-Starspan approach and r.resamp.stats approach produce very similar results. It is not yet clear to me why the results differ, further chats with the GRASS development crowd will no doubt provide answers. Various methods for assessing the similarity of the two outputs are described below.
  • The mode aggriated (via r.resamp.stats) data differed considerably from the default NN resampling. While this was expected, the difference map (figure 8) and various comparisons yeild a quantification of difference. Quantification of the difference shown in Figures 9 through 13.
  • As a side note, the mode aggrigated image better preserves the continuity of the original features as compared to the image created with the default NN algorithm (Figures 5 and 6).

Raster aggregation 1: the original 4 meter resolution IKONOS image with new 40 meter grid overlaid.Raster aggregation 1: the original 4 meter resolution IKONOS image with new 40 meter grid overlaid.

Raster aggregation 2: IKONOS image resampled to 40m resolution via GRASS default nearest neighbor algorithm.Raster aggregation 2: IKONOS image resampled to 40m resolution via GRASS default nearest neighbor algorithm.

Raster aggregation 3: original image resampled via selection of mode value within 40 meter grid cell.Raster aggregation 3: original image resampled via selection of mode value within 40 meter grid cell.

Raster aggregation 5: 40m NN resampled image overlayed on IKONOS red channel.Raster aggregation 5: 40m NN resampled image overlayed on IKONOS red channel.

Raster aggregation 6: 40m mode aggrigated image over original IKONOS red channel.Raster aggregation 6: 40m mode aggrigated image over original IKONOS red channel.

Difference between r.resamp.stats and starspanRaster aggregation 7: r.resamp.stats output vs. starspan output

Difference between r.resamp.stats and default NN algorithmRaster aggregation 8: r.resamp.stats output vs. default NN algorithm

Coomparison of methods via regression analysisRaster aggregation 9: Comparison of methods via regression analysis

Coomparison of methods via 2D histogramRaster aggregation 10: Comparison of methods via 2D histogram

Comparison of methods via histogram of categoriesRaster aggregation 11: Comparison of methods via histogram of categories

Comparison of methods via mosiac plot of categories: r.resamp.stats vs. starspanRaster aggregation 12: Comparison of methods via mosiac plot of categories (r.resamp.stats vs. starspan)

Comparison of methods via mosiac plot of categories 2: starspan vs. default NN algorithmRaster aggregation 13: Comparison of methods via mosiac plot of categories 2 (starspan vs. default NN algorithm)

 
Raster aggregation via r.resamp.stats approach

 ## perform aggregation with new GRASS module r.resamp.stats
r.resamp.stats --o -n in=ikonos.rgb out=ikonos.40m.mode method=mode
# convert to CELL
r.mapcalc "b_40 = int(ikonos.40m.mode)"
# fix colors
r.colors map=b_40 rast=ikonos.rgb

 
Raster aggregation via GRASS-Stsrspan appraoch

 #shift to new grid spacing
g.region res=40 -ap

projection: 1 (UTM)
zone:       10
datum:      nad83
ellipsoid:  grs80
north:      4038840
south:      4038320
west:       664880
east:       665440
nsres:      40
ewres:      40
rows:       13
cols:       14

#create a NN based ikonos image:
r.resample in=ikonos.rgb out=ikonos.40m

#create vector grid from raster
#note that this will not quite work, as adjacent cells with the same value will be merged
#r.to.vect in=ikonos.rgb out=a_vect feature=area

#make a vector based on the region data: 40m grid
#works well
v.mkgrid map=a_vect grid=13,14

#note that direct reading of our vector file does not work!
#must export to shapefile first...

v.out.ogr in=a_vect dsn=. type=area olayer=a

starspan --raster grass/pinn/PERMANENT/cellhd/ikonos.rgb --vector a.shp --stats a.stats mode

#save centroid x,y, and cat values to text
v.out.ascii in=a_vect > a.xy

#filter this, levaing only the mode cell values
awk -F "," '$8 !~ "mode" {split($8,a,"."); print a[1]}' a.stats > a.mode

#re-combine the x,y,cat, and mode cell values
#note that this works as both input files are in the same order
paste -d "|" a.xy a.mode  | v.in.ascii  out=a_vect_pt

#add a new column to the vector 40m grid
v.db.addcol map=a_vect column="mode int"

#add the mode cell value back to the vector grid:
#doesn't work...
#v.what.vect vect=a_vect qvector=a_vect_pt column=mode qcolumn=int_3

#use this instead
#simple join in mysql
echo "update a_vect, a_vect_pt set a_vect.mode = a_vect_pt.int_4 where a_vect.cat = a_vect_pt.cat ; " | db.execute

#convert the vector 40m grid to raster
v.to.rast in=a_vect out=a_40 use=attr column=mode

#set the color table to the same color table as our starting raster
r.colors map=a_40 rast=ikonos.rgb


# optional: export rasters to ascii for sharing with others:
# suppress the printing of header, encode nodata as NA
r.out.ascii -h in=a_40 out=a_40.txt null=NA
r.out.ascii -h in=b_40 out=b_40.txt null=NA
r.out.ascii -h in=ikonos.rgb out=ikonos.rgb.txt null=NA

 
Simple Analysis in GRASS

 # compute difference map between r.resamp.stats and starspan
r.mapcalc "d = abs(b_40 - a_40)"

# compute difference map between r.resamp.stats and default NN
r.mapcalc "d1 = abs(b_40 - ikonos.rgb)"

# look at difference maps, and save them
d.rast d cat=1-23340
d.rast.num d

d.rast d1 cat=1-24395
d.rast.num d1

# display both outputs over red channel of the original 4 meter image:
# note vector stream layer
d.his i=ikonos.red h=ikonos.40m
d.his i=ikonos.red h=a_40

 
Summary of each raster computed in R

#load required packages
require(spgrass6)
require(e1071)
require(gplots)
 
#load raster data from GRASS:
x <- readRAST6(c("a_40", "b_40", "ikonos.rgb"))
 
#summary statistics:
summary(x)
 

a_40 b_40 ikonos.rgb
Min. : 0 Min. : 0 Min. : 0
1st Qu.:15806 1st Qu.:15851 1st Qu.:13717
Median :17998 Median :17998 Median :17996
Mean :18191 Mean :18367 Mean :17320
3rd Qu.:20109 3rd Qu.:19088 3rd Qu.:20072
Max. :32767 Max. :32767 Max. :32767

 
## optional: for those who do not have GRASS installed
# load the raster data in, as exported from GRASS to ASCII: (see bottom of page)
# note syntax: get the ncol/nrow from g.region
# note that the results should be the same as if we read the data in from GRASS
# compare with:
# all.equal(as.vector(t(a.asc)), x@data$a_40) ----> TRUE
a.asc <- matrix(unlist( read.table('a_40.txt'), use.names=FALSE), ncol=14)
b.asc <- matrix(unlist( read.table('b_40.txt'), use.names=FALSE), ncol=14)
i.asc <- matrix(unlist( read.table('ikonos.rgb_.txt'), use.names=FALSE), ncol=14)

 
Comparison via continuous methods

# note new syntax
# simple xy plot: r.resamp.stats vs starspan
plot(x@data$b_40 ~ x@data$a_40, col='blue', cex=0.5, pch=16, main='Continuous Comparison of Methods', xlab="Starspan", ylab="r.resamp.stats and GRASS default NN")
 
# add the comparison to the NN resampled data:
points(x@data$ikonos.rgb ~ x@data$a_40, col='red', cex=0.5, pch=15 )
 
# add simple linear model lines
abline(lm(x@data$b_40 ~ x@data$a_40), col='blue')
abline(lm(x@data$ikonos.rgb ~ x@data$a_40), col='red')
 
# add a legend
legend(22936, 4779, legend=c("r.resamp.stats", "Default NN"), col=c('blue','red'), pch=c(16,15), cex=0.8)
 
#simple linear model summary:
# note that this is categorical data -- so this is an approximation
# compare r.resamp.stats to starspan
summary(lm(x@data$b_40 ~ x@data$a_40))
 
Residual standard error: 2738 on 180 degrees of freedom
Multiple R-Squared: 0.8829, Adjusted R-squared: 0.8823
F-statistic: 1358 on 1 and 180 DF, p-value: < 2.2e-16
 
# compare r.resamp.stats to default NN
summary(lm(x@data$b_40 ~ x@data$ikonos.rgb))
 
Residual standard error: 6333 on 180 degrees of freedom
Multiple R-Squared: 0.3736, Adjusted R-squared: 0.3701
F-statistic: 107.3 on 1 and 180 DF, p-value: < 2.2e-16

 
Graphical evaluation of correspondance via 2D histogram

par(mfcol=c(1,3))
#compare r.resamp.stats to default NN
hist2d(x@data$b_40, x@data$ikonos.rgb, same.scale=TRUE, col=c('white', bluered(256)), nbins=50, xlab="GRASS r.resamp.stats", ylab="GRASS NN", main="2D Histogram, 50bins")
 
#compare default NN to starspan
hist2d(x@data$a_40, x@data$ikonos.rgb, same.scale=TRUE, col=c('white', bluered(256)), nbins=50, ylab="GRASS NN", xlab="Starspan", main="2D Histogram, 50bins")
 
#compare r.resamp.stats to starspan
hist2d(x@data$b_40, x@data$a_40, same.scale=TRUE, col=c('white', bluered(256)), nbins=50, xlab="GRASS r.resamp.stats", ylab="Starspan", main="2D Histogram, 50bins")

 
Graphic comparison via categorical methods histogram and mosiacplot

# convert to factors
# starspan data
a <- as.factor(x@data$a_40)
# r.resamp.stats data
b <- as.factor(x@data$b_40)
# default NN data
i <- as.factor(x@data$ikonos.rgb)
 
# difference map between r.resamp.stats and starspan
d <- as.factor(x@data$d)
# difference map between r.resamp.stats and default NN
d1 <- as.factor(x@data$d1)
 
# histogram:
par(mfrow=c(3,1))
plot(a, main="Starspan Aggregation", xlab="Cell Code", ylab="Counts")
plot(b, main="GRASS r.resamp.stats Aggregation", xlab="Cell Code", ylab="Counts")
plot(i, main="GRASS Default \n Nearest Neighbor Resampling", xlab="Cell Code", ylab="Counts")
 
# mosaic plots:
# starspan vs. r.resamp.stats
mosaicplot(table(a,b), shade=T, cex.axis=.5, las=2, off=50, main="Categorical Comparison", ylab="r.resamp.stats", xlab="Starspan")
 
# starspan vs. default NN
mosaicplot(table(a,i), shade=T, cex.axis=.5, las=2, off=50, main="Categorical Comparison", ylab="Default NN", xlab="Starspan")

 
Compute Cohen's Kappa for each comparison:

# cohen's kappa: note that this function needs correspondance counts
# not to be confused with the cohen.kappa() function from package concord
#
# compare starspan output to r.resamp.stats
# note that we are comparing output from table()
# i.e. 'counts' of the occurance of same class data
classAgreement(table(a,b))
 

 
$diag
[1] 0.03296703
 
$kappa
[1] 0.02180419
 
$rand
[1] 0.9874325
 
$crand
[1] 0.7648124

 
# compare starspan output to Default NN behavior
# note that we are comparing output from table()
# i.e. 'counts' of the occurance of same class data
classAgreement(table(a,i))
 

 
$diag
[1] 0.005494505
 
$kappa
[1] -0.0003947888
 
$rand
[1] 0.9721936
 
$crand
[1] 0.0853148

AttachmentSize
a_40.txt1.04 KB
b_40.txt1.05 KB
ikonos.rgb_.txt1.05 KB

It is all smoothing

All the things you are talking about doing are called smoothing, going from a smaller basis of support to a larger basis. Depending on how much you want to smooth and what kind of smoothing you want you can pick different types of smoothers; everything from simple median or mode smoothing to kernel based and many other techniqes. In remote sensing you are dealing with data where you have completely sampled the surface through the scanner on the satellite or the film in the camera.

Interpolation is for the case when you have incomplete data and want to "estimate" values for places you have not sampled. So if his method is strictly for resampling grid coverages than the name is not right - it should not be interpolate.

Thanks for bringing some quantitative analysis to the GIS blog space.

After a bit of thought, it

After a bit of thought, it looks like a lot of the confusion can be attributed to the terms involved. Resampling, upscaling, downscaling, smoothing, etc. In the case of image rescaling you are indeed interpolating, as you do not actually have a value associated with the new grid cell arrangement [1]. In any case the approach taken in any such revisions to r.resample, or as recently suggested r.spatial.rescale, will need some further thought. In addition it looks like it would be best to keep any aggregation approach in a module of its own, or as a sequence of commands as presented above.

 
Refs:
1. Paul Bourke's explanation here.

Smoothing

Hi Steve.

Indeed. Sampling would best be applied to the concept of moving from grid cell support to point (grid cell center) support. Whereas going in the opposite direction would be interpolation. Perhaps it would be ideal then to move toward terminology better suited for the task [1]:

  • small grid cell to larger grid cell support: upscaling / aggregation r.aggregate
  • large grid cell to smaller grid cell support: downscaling / simulation, etc. r.???

In this case keeping the r.resample convention might be the best approach.

 
Refs:
1. Bierkens, M.F.P.; Finkle, P.A. & de Willigen, P. (ed.) Upscaling and Downscaling Methods for Environmental Research Kluwer Academic Publishers, 2000.

Updates

See the re-posted material above for the current naming conventions.