Submitted by dylan on Sun, 2007-06-03 00:22.
Example of bad Tiger data in Stanislaus County: Red lines are the original road network, green lines are the corrected road network.
The Problem
The US Census does a nice job of collecting all sorts of geographic and demographic information every 10 years. This data is available free of charge in the rather complex and soon to be replaced TIGER/LINE format. While this data covers the entire US down to the local road level, there are numerous errors and even extreme cases of coordinate-shift. Here is an example from Stanislaus County, California. The original TIGER data (red lines) are offset several hundred meters from the imagery. While it is not clear what may have caused the problem, it can be fixed without much effort using an affine transformation. We do not have the transformation matrix, however it can be 'fit' to a set of control points by several methods. The general form of the affine transform can be conveniently represented in homogeneous coordinates as:
Affine Matrix Form in homogeneous coordinates:New coordinates on the left-hand side, old coordinates on the right-hand side. The transformation matrix is the 3x3 matrix in the center.
The Solution
We first need a set of control points, good and bad coordinates. This can be accomplished in several ways, we used the d.where command in GRASS:
# load road subset:
v.in.ogr dsn=PG:'dbname=tiger2005se user=xxx password=xxx host=xxx' layer=stan_roads out=s
#
# pick up some control points: note that they will be interleaved with this approach
# odd line numbers are "bad" coordinates
# even line numbers are "good" coordinates
d.where > stan.c_points
#
# we can separate the "bad" and "good" points in R...
Computing the transformation matrix can be done with a simple regression between 'good' and 'bad' coordinates in R. Note that this approach was suggested by Prof. Brian Ripley on the R-help mailing list.
Compute the Affine Transformation Matrix in R
## read in control points
## the good and bad points are interleaved
## odd numbers are 'bad' points
## s <- read.table('stan.c_points')
## s.bad <- s[seq(from=1, to=nrow(s), by=2), ]
## s.good <- s[seq(from=2, to=nrow(s), by=2), ]
##
## make a composite dataframe
## x,y = bad data
## nx,ny = good data
## c <- data.frame(x=s.bad$V1, y=s.bad$V2, nx=s.good$V1, ny=s.good$V2)
##
## save a local copy in the format that v.transform uses
## write.table(c, file='grass_control_pts.txt', row.names=F, col.names=F)
##
## from now on just use the simplified format:
c <- read.table('grass_control_pts.txt')
names(c) <- c('x','y','nx','ny')
## compute transformation matrix, and check model fit:
l <- lm(cbind(nx,ny) ~ x + y, data=c)
##
## see at bottom of page -->
summary(l)
## convert to affine transform matrix to the form needed by PostGIS:
## transpose the regression coefficient matrix:
t(coef(l))
(Intercept) x y
nx 5017.082 1.00231639 0.00918962
ny -28138.395 -0.01352029 0.99740077
## check graphically:
x <- seq(-2080000, -2040000, by=1000)
y <- seq(-20000, 5000, by=2000)
d <- expand.grid(x=x, y=y)
p <- predict(l, d)
par(mfcol=c(1,2), pty='s')
plot(y ~ x, data=c, main='Control Points', cex=.4, pch=16)
points(ny ~ nx, data=c, col='red', cex=.4, pch=16)
## arrows(c$x, c$y, c$nx, c$ny, len=0.05, col='gray')
plot(d, cex=0.2, main='Shifted Grid')
## arrows(d$x, d$y, p[,1], p[,2], len=0.05, col='gray')
points(p, col='red', cex=.2)
## sample transformation
print(predict(l, data.frame(x=-2078417.35893968, y=-14810.57043808)), digits=10)
nx ny
-2078350.804 -14809.67334
Establishing the transformation based on control points: Red points represent where the coordinates should be. Black points are the original and incorect coordinates.
Check Affine Transform Results in PostGIS
-- matrix format:
-- R:
-- | xoff a b |
-- | yoff d e |
-- postgis wants:
-- ST_Affine(geom, a, b, d, e, xoff, yoff)
-- as a query:
SELECT asText(
ST_Affine(
'POINT(-2078417.35893968 -14810.57043808)',
1.00231639, 0.00918962, -0.01352029, 0.99740077, 5017.082, -28138.395
)
) ;
astext
--------------------------------------------
POINT(-2078350.80564006 -14809.6639251817)
-- This is the same as what R says!
Perform Affine Transformation in PostGIS
--
-- Step 1. Alter the geometry of the bad data in-place
-- note that we have a backup in 'stan_roads'
--
UPDATE roads SET wkb_geometry = ST_Affine(wkb_geometry, 1.00231639, 0.00918962, -0.01352029, 0.99740077, 5017.082, -28138.395 ) WHERE module = 'TGR06099' ;
Regression Diagnostics
Response nx :
Call:
lm(formula = nx ~ x + y, data = d)
Residuals:
Min 1Q Median 3Q Max
-207.088 -23.856 8.614 21.245 161.610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.017e+03 1.369e+03 3.666 0.00079 ***
x 1.002e+00 6.654e-04 1506.386 < 2e-16 ***
y 9.190e-03 9.419e-04 9.756 1.20e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 52.25 on 36 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
F-statistic: 1.322e+06 on 2 and 36 DF, p-value: < 2.2e-16
Response ny :
Call:
lm(formula = ny ~ x + y, data = d)
Residuals:
Min 1Q Median 3Q Max
-39.835 -18.459 -4.556 15.311 94.226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.814e+04 7.409e+02 -37.98 <2e-16 ***
x -1.352e-02 3.602e-04 -37.54 <2e-16 ***
y 9.974e-01 5.099e-04 1956.22 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 28.28 on 36 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
F-statistic: 2.187e+06 on 2 and 36 DF, p-value: < 2.2e-16
Before computing transformation matrix
... it might be good to compute a trend surface of the square differences between the two sets of coordinates, at the sites of the good coordinates.
## compute square difference:
c$sqdiff <- with(c, sqrt((nx-x)^2 + (ny-y)^2))
##
library(gstat)
##
coordinates(c) <- c('nx', 'ny')
coordinates(d) <- c('x', 'y')
gridded(d) <- TRUE
x <- gstat(sqdiff ~ 1, data=c, degree=2)
x.trend <- predict(x, newdata=d)
spplot(x.trend, "var1.pred")
Related conversation
There has been some talk about this stuff on the postgis mailing list:
thread 1
thread 2
it is possible that an affine transformation fitting function may be implemented in postgis one of these days...
Update: See this thread!
Dylan Some progress to
Dylan
Some progress to report. I have a query that takes a table of from/to
x,y values and computes the coefficients. It then computes the
determinant of the transformation matrix. Next step is to compute the
inverse and then the affine coefficients. It is a bit hard coded at
this point but when it is working I will try to make it as generic as possible.