An XML Representation of the Keys to Soil Taxonomy?

Submitted by dylan on Sat, 2010-05-29 04:45.

Western Fresno Soil Hierarchy: partial view of the hierarchy within the US Soil Taxonomic systemWestern Fresno Soil Hierarchy: partial view of the hierarchy within the US Soil Taxonomic system

Maybe this is just craziness, but wouldn't be neat to have an XML formatted version of the Keys to Soil Taxonomy? The format might look something like the following code snippet, although there may be more efficient uses of XML... The only problem I can see is that it would take a hell of a long time to type in the entire 300+ page document. A complete document of this nature would support all kinds of new and creative uses for the 'keys-- electronic look-up, automated generation of a PDA-ready version, an awesome teaching tool, or just something that could be used to generate cool figures. Anyone know of a quick way to get this put together, or of any similar document that has already been published? Anyone want to help type-in the data?









 
Example Document

<order>
  <gelisols/>
  <histosols/>
  <spodosols/>
  <andisols/>
  <oxisols/>
  <vertisols/>
  <aridisols/>
  <ultisols/>
  <mollisols/>
  <alfisols>
    <suborder>
      ...
      <xeralfs>
        <greatgroup>
          ...
          <natrixeralfs>
            <criteria code="JDB">
              <criterion text="Other xeralfs that have a natric horizon."/>
            </criteria>
            <subgroup>
              <vertic_natrixeralfs>
                <criteria code="JDBA" which="Natrixeralfs that have one or both of the following">
                  <criterion text="Cracks within 125 cm of the mineral soil surface that
                  are 5 mm or more wide through a thickness of 30 cm or more for some time
                  in normal years and slickensides or wedge-shaped peds in a layer 15 cm or
                  more thick that has its upper boundary within 125 cm of the mineral soil
                  surface; or"/>
                  <criterion text="A linear extensibility of 6.0 cm or more between the
                  mineral soil surface and either a depth of 100 cm or a densic, lithic, or
                  paralithic contact, whichever is shallower."/>
                </criteria>
              </vertic_natrixeralfs>
              <aquic_natrixeralfs>
                <critera code="JDBB">
                  <criterion text="Other natrixeralfs that have, in one or more horizons
                  within 75 cm of the mineral soils surface, redox depletions with chroma
                  of 2 or less and also aquic conditions ofr some time in normal years (or
                  artificial drainage)."/>
                </critera>
              </aquic_natrixeralfs>
              <typic_natrixeralfs>
                <critera code="JDBC">
                  <criterion text="Other natrixeralfs."/>
                </critera>
              </typic_natrixeralfs>
            </subgroup>
          </natrixeralfs>
          ...
          </subgroup>
          </haploxeralfs>
        </greatgroup>
      </xeralfs>
      ...
    </suborder>
  </alfisols>
  ...
</order>
( categories: )

Dylan, It would be better to

Dylan,

It would be better to use a generic structure for the XML tags (e.g. <ggroup name="haploxeralfs">). If you already have the information in a database it is quite trivial to generate the XML. You could populate the XSD in a similar fashion. I'd help you out, but I use the Canadian system.

Cheers, Peter.

SKOS could be a good fit.

SKOS could be a good fit. It's designed for representing taxonomies and is widely used. http://www.w3.org/2004/02/skos/references.

Re: XML Representation

Use xpdf to convert the pdf to an ASCII file and use python to parse the text file and extract the data and format it in XML? Of course, it might be quicker just to type it :-).

pdftotext conversion...

Hi Doug. Good idea, except that the resulting text is not well-formatted enough to parse with something like python. I have a feeling that you are right about it taking longer to code such a solution vs. typing it all in. The text version would make it simpler to copy/paste blocks of text into the appropriate <criterion> tags. Want to help?

Consider using XML Schema...

Hi Dylan,
What you describe is what I have been busy with, the last couple of months. Since you are in fact *describing a standard* (rather than *recording actual data*) you might want to use the corresponding standard for XML for this usage scenario: (annotated) XML Schema (XSD). You will end up with a document definition you can use for "guided data collection" by feeding it to some generic software tool; possibly running on a PDA.

XSD vs. raw XML

Thanks for the input. Are you actually working towards an XML version of the Keys to Soil Taxonomy? Let me know if you are, and if you plan on releasing the data. I'll take a look at the XSD material; it looks promising.

RE: XSD vs. raw XML

Sorry; it took me a while to get back to your blog and the reply you posted.

I am working on an annotated XSD representation of the Dutch standard on this same subject; apart from semantics this also involves the validity of combinations of these semantics as well as value ranges of numeric fields. I will use the model as an input for a generic (thus: model driven) data entry application (writing it myself; there doesn't seem to be anything readily available that takes this approach) producing XML instance documents describing actual borehole samples (or whatever else application in the future).