Overview
The ST_Affine() function from PostGIS is useful for manipulating geometries, but requires the elements of a transformation matrix. This page documents progress on automating the computation of the transformation matrix by least-squares (Bruce Rindahl) via SQL. This would allow a pure PostGIS solution to computing and applying affine transformations to geometry data.
An open-source algorithm for computing the transformation matrix
Example code from GRASS (v.transform) was used as a template.
Approach
Compute transformation matrix based on a table of control points, stored as numbers.
Evaluation of results
Comparable to output from a similar analysis done in R, and the original algorithm as implemented in v.transform (GRASS).
Example of bad Tiger data in Stanislaus County: Red lines are the original road network, green lines are the corrected road network.
The Problem
The US Census does a nice job of collecting all sorts of geographic and demographic information every 10 years. This data is available free of charge in the rather complex and soon to be replaced TIGER/LINE format. While this data covers the entire US down to the local road level, there are numerous errors and even extreme cases of coordinate-shift. Here is an example from Stanislaus County, California. The original TIGER data (red lines) are offset several hundred meters from the imagery. While it is not clear what may have caused the problem, it can be fixed without much effort using an affine transformation. We do not have the transformation matrix, however it can be 'fit' to a set of control points by several methods. The general form of the affine transform can be conveniently represented in homogeneous coordinates as:
Affine Matrix Form in homogeneous coordinates:New coordinates on the left-hand side, old coordinates on the right-hand side. The transformation matrix is the 3x3 matrix in the center.
The Solution
We first need a set of control points, good and bad coordinates. This can be accomplished in several ways, we used the d.where command in GRASS:
Computing the transformation matrix can be done with a simple regression between 'good' and 'bad' coordinates in R. Note that this approach was suggested by Prof. Brian Ripley on the R-help mailing list.
Compute the Affine Transformation Matrix in R
Establishing the transformation based on control points: Red points represent where the coordinates should be. Black points are the original and incorect coordinates.
Check Affine Transform Results in PostGIS
Perform Affine Transformation in PostGIS
Regression Diagnostics
Response nx :
Call:
lm(formula = nx ~ x + y, data = d)
Residuals:
Min 1Q Median 3Q Max
-207.088 -23.856 8.614 21.245 161.610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.017e+03 1.369e+03 3.666 0.00079 ***
x 1.002e+00 6.654e-04 1506.386 < 2e-16 ***
y 9.190e-03 9.419e-04 9.756 1.20e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 52.25 on 36 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
F-statistic: 1.322e+06 on 2 and 36 DF, p-value: < 2.2e-16
Response ny :
Call:
lm(formula = ny ~ x + y, data = d)
Residuals:
Min 1Q Median 3Q Max
-39.835 -18.459 -4.556 15.311 94.226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.814e+04 7.409e+02 -37.98 <2e-16 ***
x -1.352e-02 3.602e-04 -37.54 <2e-16 ***
y 9.974e-01 5.099e-04 1956.22 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 28.28 on 36 degrees of freedom
Multiple R-Squared: 1, Adjusted R-squared: 1
F-statistic: 2.187e+06 on 2 and 36 DF, p-value: < 2.2e-16
First the output from R:
Looking at the residuals from the regression model used to map bad coordinates (x,y) to good coordinates (nx,ny):
x y nx ny resid
1 -2078417 -14810.570 -2078314 -14838.378 46.617600
2 -2078743 -16057.955 -2078636 -16081.790 62.041274
3 -2077261 -16435.348 -2077170 -16463.156 40.905132
4 -2076709 -14405.369 -2076606 -14433.177 29.406399
5 -2074179 -15830.248 -2074084 -15901.558 33.111981
6 -2073850 -15707.435 -2073763 -15798.554 37.362736
7 -2073450 -13873.171 -2073359 -13920.712 21.623235
8 -2072359 -15204.613 -2072276 -15323.138 38.997678
9 -2072545 -14402.596 -2072450 -14513.219 32.918889
10 -2072189 -16022.434 -2072098 -16129.106 33.834074
11 -2071991 -16856.058 -2071928 -16942.976 6.277554
12 -2068407 -12999.396 -2068296 -13133.170 6.579285
13 -2069870 -12613.813 -2069764 -12731.848 2.357631
14 -2067635 -13188.253 -2067517 -13337.765 11.604519
15 -2066931 -13377.110 -2066809 -13518.753 22.719625
16 -2067411 -15084.692 -2067313 -15190.924 41.907273
17 -2066795 -18714.093 -2066741 -18846.019 14.541358
18 -2066384 -17080.538 -2066299 -17212.464 26.717495
19 -2068634 -19742.339 -2068580 -19835.464 27.483654
20 -2053326 -16930.710 -2053276 -17226.351 65.746074
21 -2051797 -17321.500 -2051899 -17579.762 227.516944
22 -2068307 2826.921 -2068066 2638.276 12.587853
23 -2067543 2648.205 -2067328 2449.631 37.729747
24 -2067126 4276.510 -2066904 4081.246 46.774630
25 -2066748 4170.604 -2066527 4001.816 59.509843
26 -2066068 2292.295 -2065860 2094.699 46.681553
27 -2065337 2107.872 -2065126 1900.397 43.386956
28 -2064606 1913.570 -2064378 1692.922 26.460389
29 -2064199 3558.561 -2063961 3356.401 47.742696
30 -2037464 6512.455 -2037076 5864.398 50.762994
31 -2036722 6825.682 -2036338 6199.227 22.699467
32 -2036876 6366.642 -2036498 5742.888 22.120176
33 -2040225 7150.180 -2039706 6575.029 161.631199
34 -2041064 7144.779 -2040732 6569.629 26.657582
35 -2044702 -15548.033 -2044564 -16024.903 23.844817
36 -2043992 -15723.521 -2043824 -16223.282 48.063840
37 -2043790 -14907.119 -2043611 -15383.990 34.844851
38 -2040616 -14820.445 -2040453 -15349.974 21.196233
39 -2039595 -15081.427 -2039485 -15588.263 47.263287
The Root-Mean-Square-Error (RMSE) for the fitted transform (in meters) is:
The output from v.transform on the same set of control points:
Transformation Matrix | xoff a b | | yoff d e | ------------------------------------------- 5301.399323 1.002469 0.009172 -28155.882288 -0.013530 0.997547 -------------------------------------------
full output including the residuals:
CHECK MAP RESIDUALS
Current Map New Map
POINT X coord Y coord | X coord Y coord | residuals
1. -2078417.36 -14810.57 | -2078314.07 -14838.38 | 46.81
2. -2078743.11 -16057.95 | -2078635.85 -16081.79 | 62.22
3. -2077261.34 -16435.35 | -2077169.97 -16463.16 | 41.05
4. -2076709.16 -14405.37 | -2076605.87 -14433.18 | 29.59
5. -2074178.76 -15830.25 | -2074083.67 -15901.56 | 33.21
6. -2073849.93 -15707.44 | -2073762.78 -15798.55 | 37.42
7. -2073449.80 -13873.17 | -2073358.68 -13920.71 | 21.62
8. -2072358.86 -15204.61 | -2072275.89 -15323.14 | 39.02
9. -2072544.55 -14402.60 | -2072449.73 -14513.22 | 32.97
10. -2072188.97 -16022.43 | -2072098.11 -16129.11 | 33.87
11. -2071991.43 -16856.06 | -2071928.22 -16942.98 | 6.27
12. -2068406.55 -12999.40 | -2068296.38 -13133.17 | 6.60
13. -2069870.19 -12613.81 | -2069763.96 -12731.85 | 2.33
14. -2067635.38 -13188.25 | -2067517.35 -13337.76 | 11.63
15. -2066931.10 -13377.11 | -2066809.13 -13518.75 | 22.74
16. -2067411.11 -15084.69 | -2067312.75 -15190.92 | 41.93
17. -2066795.16 -18714.09 | -2066740.84 -18846.02 | 14.64
18. -2066383.87 -17080.54 | -2066298.50 -17212.46 | 26.74
19. -2068634.37 -19742.34 | -2068580.05 -19835.46 | 27.53
20. -2053326.48 -16930.71 | -2053275.51 -17226.35 | 66.09
21. -2051797.30 -17321.50 | -2051899.25 -17579.76 | 227.91
22. -2068307.24 2826.92 | -2068065.64 2638.28 | 12.41
23. -2067542.73 2648.21 | -2067327.61 2449.63 | 37.44
24. -2067125.72 4276.51 | -2066903.98 4081.25 | 46.40
25. -2066748.43 4170.60 | -2066526.69 4001.82 | 59.12
26. -2066067.79 2292.29 | -2065860.31 2094.70 | 46.35
27. -2065336.69 2107.87 | -2065125.92 1900.40 | 43.07
28. -2064605.58 1913.57 | -2064378.35 1692.92 | 26.16
29. -2064199.15 3558.56 | -2063961.13 3356.40 | 47.43
30. -2037464.39 6512.45 | -2037075.56 5864.40 | 50.66
31. -2036721.82 6825.68 | -2036338.39 6199.23 | 22.54
32. -2036875.74 6366.64 | -2036497.71 5742.89 | 21.95
33. -2040224.67 7150.18 | -2039706.23 6575.03 | 161.54
34. -2041064.45 7144.78 | -2040732.32 6569.63 | 26.74
35. -2044701.68 -15548.03 | -2044564.34 -16024.90 | 23.64
36. -2043992.10 -15723.52 | -2043824.24 -16223.28 | 47.60
37. -2043789.90 -14907.12 | -2043610.60 -15383.99 | 34.35
38. -2040615.94 -14820.44 | -2040453.30 -15349.97 | 20.77
39. -2039594.70 -15081.43 | -2039485.02 -15588.26 | 47.85
Number of points: 39
Residual mean average: 57.082951
... Continuing from the initial example session in R ...
An affine transformation is based on a linear mapping between two coordinate-spaces. Testing for non-linearity (i.e. higher order model terms) can be a useful diagnostic in choosing the simpler affine transform.
Compute the difference between the good and bad coordinates
Generate two linear models:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 302.38 3.97 76.174 < 2e-16 ***
poly(nx, 3)1 1091.17 35.82 30.466 < 2e-16 ***
poly(nx, 3)2 165.59 32.78 5.051 1.71e-05 ***
poly(nx, 3)3 -51.21 28.74 -1.782 0.0843 .
poly(ny, 3)1 417.35 31.97 13.056 2.30e-14 ***
poly(ny, 3)2 -18.50 30.04 -0.616 0.5425
poly(ny, 3)3 19.41 35.49 0.547 0.5882
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Is one model significantly better than the other?
Analysis of Variance Table Model 1: sqdiff ~ nx + ny Model 2: sqdiff ~ poly(nx, 3) + poly(ny, 3) Res.Df RSS Df Sum of Sq F Pr(>F) 1 36 52093 2 32 19665 4 32428 13.192 1.865e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Check visually:
Testing for linearity: Two visualizations of the deviance between coordinates positions, at the control point locations.
Conclusions:
It seems that a second order term was only warranted along the x-direction. The more complex model based on 3rd-order polynomials results in a significantly lower RMSE (about 10 meters lower), and is shown to be a better descriptor of variance in the test of nested models.
At the map scale in which the corrected data will be presented, the extra accuracy suggested by the more complex model (coordinate transformation function) is negligible. This allows for the simpler model, which can be directly used by the convenient ST_Affine() function in PostGIS for the heavy-lifting.
The following is a proof of concept query showing how a PostgreSQL query could give the transformation parameters required for an affine transformation. The source of the procedure is from GRASS at transform.c.
Note that the v.transform code expects the input file (control points) to be in the form ax ay bx by, where 'a' is the starting coordinate system (the bad coordinates in the previous example) and 'b' is the target system (the good coordinates). The
SQL version is as follows:
Sample Session
This query requires a table called link with the following fields - gid (primary key), a_x, a_y , b_x, b_y. The 'a' values are the 'from' coordinates and the 'b' values are the 'to' coordinates. Using the attached control points the result of this query is:
b0 | b1 | b2 | b3 | b4 | b5
------------------+---------------------+-------------------+------------------+------------------+---------------------
-28138.394850347 | -0.0135202854235867 | 0.997400773420259 | 5017.08164289594 | 1.00231638907948 | 0.00918961946271679
These are the exact results from 'R'
Based on the results of the proof of concept example developed previously, a single function was developed in the procedural language for the PostgreSQL database system called PL/pgSQL. The only input parameter to this procedure is a text string that results in a table in the following format:
Note the table must have the above fields but they can be in any order and can have additional fields. The gid field must be unique for each record.
The use of a SQL query found to be the simplest way to avoid the difficulties in programing the procedure without the need for temporary tables. An added benefit is the control point data can be in almost any format as long as it can be arranged in the format specified above. For example if the "from" points are in a geometry column (the_geom) in a table called from_pts and the corresponding "to" points are in a similar table called to_pts with a common attribute called "link_id", then the query would be:
Other table layouts and queries are possible depending on the manner in which the control points are collected.
The following is the SQL code to add a new procedure called trans_param() into a PostGIS database:
Currently there is no error checking in the code if the determinant is zero.
To use the procedure simply use: SELECT trans_param('my SQL text')
The data from this series of articles is stored in a table called link. The id of the points is gid and the "from" values are b_x and b_y. The "to" values are a_x and a_y. Thus the query is:
The result of the procedure is:
0.997546509279282;-0.00917177514909895;0.0135300872142122;1.00246938174737;-5301.39933295548;28155.8822879205
Which matches the results returned from GRASS and R. Additional queries will be developed to give a table of residuals, the RMS error and the actual transformation of geometry.
This is a start for discussions to create a series of function to perform an affine transformation of a PostGIS data set using a table of control points.
The first step is what is the format or layout of the control points?
I don't think point geometries is necessarily a good idea. The points must have an exact 1 to 1 relationship. To assure this you either have to maintain absolute integrity on the keys between two tables or have two geometries in one table. Both would be a hassle. An interleaved table format also would give me problems because the queries get really difficult and what if one version used "good" points in the odd rows and the next one put them in the even ones? In addition, if you use a seguential id and add one row "good" then add a "bad" one, then realize the bad one is really bad and delete it then the gid's will not be odd/even anymore in relation to "good" vs "bad". I think ESRI has the right idea here (God did I actually say that?). Look at the link table interface when you are georeferencing an image. It gives the id, XSource, YScource, Xmap, YMap, and the residual error. This could easily be done with the code - just one more query.
Given a table in this format a query could be done to give the RMS error. Also the table could be returned with the residual errors. When the user is happy then the table and the geometry could be input to transform the geometry.
Bruce