Research SitesNavigationUser loginWho's onlineThere are currently 0 users and 7 guests online.
|
Case Study: Fixing Bad TIGER Line data with R and PostGISSubmitted by dylan on Sun, 2007-06-03 00:22.
We first need a set of control points, good and bad coordinates. This can be accomplished in several ways, we used the d.where command in GRASS: # load road subset: v.in.ogr dsn=PG:'dbname=tiger2005se user=xxx password=xxx host=xxx' layer=stan_roads out=s # # pick up some control points: note that they will be interleaved with this approach # odd line numbers are "bad" coordinates # even line numbers are "good" coordinates d.where > stan.c_points # # we can separate the "bad" and "good" points in R... Computing the transformation matrix can be done with a simple regression between 'good' and 'bad' coordinates in R. Note that this approach was suggested by Prof. Brian Ripley on the R-help mailing list. ## read in control points ## the good and bad points are interleaved ## odd numbers are 'bad' points ## s <- read.table('stan.c_points') ## s.bad <- s[seq(from=1, to=nrow(s), by=2), ] ## s.good <- s[seq(from=2, to=nrow(s), by=2), ] ## ## make a composite dataframe ## x,y = bad data ## nx,ny = good data ## c <- data.frame(x=s.bad$V1, y=s.bad$V2, nx=s.good$V1, ny=s.good$V2) ## ## save a local copy in the format that v.transform uses ## write.table(c, file='grass_control_pts.txt', row.names=F, col.names=F) ## ## from now on just use the simplified format: c <- read.table('grass_control_pts.txt') names(c) <- c('x','y','nx','ny') ## compute transformation matrix, and check model fit: l <- lm(cbind(nx,ny) ~ x + y, data=c) ## ## see at bottom of page --> summary(l) ## convert to affine transform matrix to the form needed by PostGIS: ## transpose the regression coefficient matrix: t(coef(l)) (Intercept) x y nx 5017.082 1.00231639 0.00918962 ny -28138.395 -0.01352029 0.99740077 ## check graphically: x <- seq(-2080000, -2040000, by=1000) y <- seq(-20000, 5000, by=2000) d <- expand.grid(x=x, y=y) p <- predict(l, d) par(mfcol=c(1,2), pty='s') plot(y ~ x, data=c, main='Control Points', cex=.4, pch=16) points(ny ~ nx, data=c, col='red', cex=.4, pch=16) ## arrows(c$x, c$y, c$nx, c$ny, len=0.05, col='gray') plot(d, cex=0.2, main='Shifted Grid') ## arrows(d$x, d$y, p[,1], p[,2], len=0.05, col='gray') points(p, col='red', cex=.2) ## sample transformation print(predict(l, data.frame(x=-2078417.35893968, y=-14810.57043808)), digits=10) nx ny -2078350.804 -14809.67334
-- matrix format: -- R: -- | xoff a b | -- | yoff d e | -- postgis wants: -- ST_Affine(geom, a, b, d, e, xoff, yoff) -- as a query: SELECT asText( ST_Affine( 'POINT(-2078417.35893968 -14810.57043808)', 1.00231639, 0.00918962, -0.01352029, 0.99740077, 5017.082, -28138.395 ) ) ; astext -------------------------------------------- POINT(-2078350.80564006 -14809.6639251817) -- This is the same as what R says! -- -- Step 1. Alter the geometry of the bad data in-place -- note that we have a backup in 'stan_roads' -- UPDATE roads SET wkb_geometry = ST_Affine(wkb_geometry, 1.00231639, 0.00918962, -0.01352029, 0.99740077, 5017.082, -28138.395 ) WHERE module = 'TGR06099' ;
Response nx :
Call:
lm(formula = nx ~ x + y, data = d)
Residuals:
Min 1Q Median 3Q Max
-207.088 -23.856 8.614 21.245 161.610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.017e+03 1.369e+03 3.666 0.00079 ***
x 1.002e+00 6.654e-04 1506.386 < 2e-16 ***
y 9.190e-03 9.419e-04 9.756 1.20e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 52.25 on 36 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
F-statistic: 1.322e+06 on 2 and 36 DF, p-value: < 2.2e-16
Response ny :
Call:
lm(formula = ny ~ x + y, data = d)
Residuals:
Min 1Q Median 3Q Max
-39.835 -18.459 -4.556 15.311 94.226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.814e+04 7.409e+02 -37.98 <2e-16 ***
x -1.352e-02 3.602e-04 -37.54 <2e-16 ***
y 9.974e-01 5.099e-04 1956.22 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 28.28 on 36 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
F-statistic: 2.187e+06 on 2 and 36 DF, p-value: < 2.2e-16
( categories: )
Reply |