NavigationUser login |
Raster aggregation ideas: GRASS and StarSpan [updated]Submitted by dylan on Wed, 2006-11-08 18:56.
UPDATE: November, 2006. Glynn C. has kindly implemented raster aggregation in the new module r.resamp.stats, and raster re-sampling in the new module r.resamp.interp.
UPDATE November 7, 2006. Update: Roger Bivand has fixed a small error in the spgrass6 R package, so that raster cells with a value of '0' are not interpreted as NA (no data).
The general approach to aggregating the IKONOS imagery can be summarized as: compute the statistical mode of all cells (at a 4 meter grid spacing) falling within a superimposed 40 meter grid (Figure 1). For reference, the region was adjusted to a resolution of 40 meters, and a copy of the original image was made. This copy (ikonos.rgb) represents the default NN algorithm applied to GRASS data when the region resolution does not match the native resolution of a raster.
## perform aggregation with new GRASS module r.resamp.stats r.resamp.stats --o -n in=ikonos.rgb out=ikonos.40m.mode method=mode # convert to CELL r.mapcalc "b_40 = int(ikonos.40m.mode)" # fix colors r.colors map=b_40 rast=ikonos.rgb #shift to new grid spacing g.region res=40 -ap projection: 1 (UTM) zone: 10 datum: nad83 ellipsoid: grs80 north: 4038840 south: 4038320 west: 664880 east: 665440 nsres: 40 ewres: 40 rows: 13 cols: 14 #create a NN based ikonos image: r.resample in=ikonos.rgb out=ikonos.40m #create vector grid from raster #note that this will not quite work, as adjacent cells with the same value will be merged #r.to.vect in=ikonos.rgb out=a_vect feature=area #make a vector based on the region data: 40m grid #works well v.mkgrid map=a_vect grid=13,14 #note that direct reading of our vector file does not work! #must export to shapefile first... v.out.ogr in=a_vect dsn=. type=area olayer=a starspan --raster grass/pinn/PERMANENT/cellhd/ikonos.rgb --vector a.shp --stats a.stats mode #save centroid x,y, and cat values to text v.out.ascii in=a_vect > a.xy #filter this, levaing only the mode cell values awk -F "," '$8 !~ "mode" {split($8,a,"."); print a[1]}' a.stats > a.mode #re-combine the x,y,cat, and mode cell values #note that this works as both input files are in the same order paste -d "|" a.xy a.mode | v.in.ascii out=a_vect_pt #add a new column to the vector 40m grid v.db.addcol map=a_vect column="mode int" #add the mode cell value back to the vector grid: #doesn't work... #v.what.vect vect=a_vect qvector=a_vect_pt column=mode qcolumn=int_3 #use this instead #simple join in mysql echo "update a_vect, a_vect_pt set a_vect.mode = a_vect_pt.int_4 where a_vect.cat = a_vect_pt.cat ; " | db.execute #convert the vector 40m grid to raster v.to.rast in=a_vect out=a_40 use=attr column=mode #set the color table to the same color table as our starting raster r.colors map=a_40 rast=ikonos.rgb # optional: export rasters to ascii for sharing with others: # suppress the printing of header, encode nodata as NA r.out.ascii -h in=a_40 out=a_40.txt null=NA r.out.ascii -h in=b_40 out=b_40.txt null=NA r.out.ascii -h in=ikonos.rgb out=ikonos.rgb.txt null=NA # compute difference map between r.resamp.stats and starspan r.mapcalc "d = abs(b_40 - a_40)" # compute difference map between r.resamp.stats and default NN r.mapcalc "d1 = abs(b_40 - ikonos.rgb)" # look at difference maps, and save them d.rast d cat=1-23340 d.rast.num d d.rast d1 cat=1-24395 d.rast.num d1 # display both outputs over red channel of the original 4 meter image: # note vector stream layer d.his i=ikonos.red h=ikonos.40m d.his i=ikonos.red h=a_40
#load required packages
require(spgrass6) require(e1071) require(gplots) #load raster data from GRASS: x <- readRAST6(c("a_40", "b_40", "ikonos.rgb")) #summary statistics: summary(x)
# note new syntax
# simple xy plot: r.resamp.stats vs starspan plot(x@data$b_40 ~ x@data$a_40, col='blue', cex=0.5, pch=16, main='Continuous Comparison of Methods', xlab="Starspan", ylab="r.resamp.stats and GRASS default NN") # add the comparison to the NN resampled data: points(x@data$ikonos.rgb ~ x@data$a_40, col='red', cex=0.5, pch=15 ) # add simple linear model lines abline(lm(x@data$b_40 ~ x@data$a_40), col='blue') abline(lm(x@data$ikonos.rgb ~ x@data$a_40), col='red') # add a legend legend(22936, 4779, legend=c("r.resamp.stats", "Default NN"), col=c('blue','red'), pch=c(16,15), cex=0.8) #simple linear model summary: # note that this is categorical data -- so this is an approximation # compare r.resamp.stats to starspan summary(lm(x@data$b_40 ~ x@data$a_40)) Residual standard error: 2738 on 180 degrees of freedom Multiple R-Squared: 0.8829, Adjusted R-squared: 0.8823 F-statistic: 1358 on 1 and 180 DF, p-value: < 2.2e-16 # compare r.resamp.stats to default NN summary(lm(x@data$b_40 ~ x@data$ikonos.rgb)) Residual standard error: 6333 on 180 degrees of freedom Multiple R-Squared: 0.3736, Adjusted R-squared: 0.3701 F-statistic: 107.3 on 1 and 180 DF, p-value: < 2.2e-16
par(mfcol=c(1,3))
#compare r.resamp.stats to default NN hist2d(x@data$b_40, x@data$ikonos.rgb, same.scale=TRUE, col=c('white', bluered(256)), nbins=50, xlab="GRASS r.resamp.stats", ylab="GRASS NN", main="2D Histogram, 50bins") #compare default NN to starspan hist2d(x@data$a_40, x@data$ikonos.rgb, same.scale=TRUE, col=c('white', bluered(256)), nbins=50, ylab="GRASS NN", xlab="Starspan", main="2D Histogram, 50bins") #compare r.resamp.stats to starspan hist2d(x@data$b_40, x@data$a_40, same.scale=TRUE, col=c('white', bluered(256)), nbins=50, xlab="GRASS r.resamp.stats", ylab="Starspan", main="2D Histogram, 50bins")
# convert to factors
# starspan data a <- as.factor(x@data$a_40) # r.resamp.stats data b <- as.factor(x@data$b_40) # default NN data i <- as.factor(x@data$ikonos.rgb) # difference map between r.resamp.stats and starspan d <- as.factor(x@data$d) # difference map between r.resamp.stats and default NN d1 <- as.factor(x@data$d1) # histogram: par(mfrow=c(3,1)) plot(a, main="Starspan Aggregation", xlab="Cell Code", ylab="Counts") plot(b, main="GRASS r.resamp.stats Aggregation", xlab="Cell Code", ylab="Counts") plot(i, main="GRASS Default \n Nearest Neighbor Resampling", xlab="Cell Code", ylab="Counts") # mosaic plots: # starspan vs. r.resamp.stats mosaicplot(table(a,b), shade=T, cex.axis=.5, las=2, off=50, main="Categorical Comparison", ylab="r.resamp.stats", xlab="Starspan") # starspan vs. default NN mosaicplot(table(a,i), shade=T, cex.axis=.5, las=2, off=50, main="Categorical Comparison", ylab="Default NN", xlab="Starspan")
# cohen's kappa: note that this function needs correspondance counts
# not to be confused with the cohen.kappa() function from package concord # # compare starspan output to r.resamp.stats # note that we are comparing output from table() # i.e. 'counts' of the occurance of same class data classAgreement(table(a,b)) $diag [1] 0.03296703 $kappa [1] 0.02180419 $rand [1] 0.9874325 $crand [1] 0.7648124
( categories: )
|
It is all smoothing
All the things you are talking about doing are called smoothing, going from a smaller basis of support to a larger basis. Depending on how much you want to smooth and what kind of smoothing you want you can pick different types of smoothers; everything from simple median or mode smoothing to kernel based and many other techniqes. In remote sensing you are dealing with data where you have completely sampled the surface through the scanner on the satellite or the film in the camera.
Interpolation is for the case when you have incomplete data and want to "estimate" values for places you have not sampled. So if his method is strictly for resampling grid coverages than the name is not right - it should not be interpolate.
Thanks for bringing some quantitative analysis to the GIS blog space.
After a bit of thought, it
After a bit of thought, it looks like a lot of the confusion can be attributed to the terms involved. Resampling, upscaling, downscaling, smoothing, etc. In the case of image rescaling you are indeed interpolating, as you do not actually have a value associated with the new grid cell arrangement [1]. In any case the approach taken in any such revisions to r.resample, or as recently suggested r.spatial.rescale, will need some further thought. In addition it looks like it would be best to keep any aggregation approach in a module of its own, or as a sequence of commands as presented above.
Refs:
1. Paul Bourke's explanation here.
Smoothing
Hi Steve.
Indeed. Sampling would best be applied to the concept of moving from grid cell support to point (grid cell center) support. Whereas going in the opposite direction would be interpolation. Perhaps it would be ideal then to move toward terminology better suited for the task [1]:
In this case keeping the r.resample convention might be the best approach.
Refs:
1. Bierkens, M.F.P.; Finkle, P.A. & de Willigen, P. (ed.) Upscaling and Downscaling Methods for Environmental Research Kluwer Academic Publishers, 2000.
Updates
See the re-posted material above for the current naming conventions.