Pages are scanned at 300 DPI and saved as TIFF images
Chemical data images are rotated, and manually cropped to a tight bounding box Soil Veg Example Scan: An example scanned image of the chemical data associated with a Soil Veg record.
'Strips' within the chemical data images are interactively located on screen; 6 clicks and number of horizons keyed in Interactive Identification of Data Regions: A simple GUI was constructed to interactively identify the data regions on the Soil Veg chemical sheet.
Chemical data images are then 'chunked' by an automated routine, converted into text, and inserted into a database Automated Chunking of Soil Veg Chemical Data
Proof-reading stage
Soil Veg Proofing Application: Web-based application for proof-reading the Soil Veg chemical data. Original image chunks and OCR-ed text fragments are vertically aligned for ease of use.
Chemical data is proof-read via an interactive web-based application
This application checks for common errors and alerts the proof reader with visual cues
Taxonomy, horizon designation and unit of measure update stage