GIS Logo GSP 118 (318): GIS Programming

The XML Standard

In Review

Introduction

Extensible Markup Language (XML) is the general format for a wide variety of languages that are used for the transfer of data that is hierarchically organized. This includes Hyper Text Markup Language (HTML), Ecological Markup Language (SML), and a host of others. Also, XML is used to store data in files including the metadata associated with GIS data.

We rarely have to process XML directly. There are libraries available to handle XML in general and many specific formats of XML. The times when we need to access XML directly is usually when there is a problem and the libraries fail. I have also had to write XML readers when the available ones were simply too slow.

The Structure of XML

XML is a very simple language based on "tags". A tag has a name an in it's simplest for is simply:

<TagName/>

The "less than symbol" ("<") indicates the start of the tag. There is always a TagName right after the less-than symbol. The tag then ends with a "forward slash" ("/") character and a "greater than symbol (">").

Tags can also include "attributes". These are values identified by a "key", also known as a "key-value" pair. Attributes appear after the tag name:

<TagName Key1='Value1' Key2='Value2' />

Tags can contain other content by ending without the "/" character and then terminating the tag as follows:

<TagName Key1='Value1'>
	Content within the tag
</TagName>

Notice that to terminate the tag we put the "/" in front of the TagName. The content within a tag can be just plain text but more often it is another tag as below:

<ParentTagName Key1='Value1'>
	<ChildTagName/>
</ParentTagName>

Because these tags form a "hierarchical" structure, the outer tag is often referred to as the "Parent" tag while the inner one is referred to as the "Child" tag. You can then embed another tag within the embedded tag and so on. This allows for very complex structures to be on a very simple definition. Most XML formats today do not even include attributes. Instead, they place the attributes as a tag within the parent tag.

The only other thing you need to know about XML is that a "header" tag is required at the start of an XML file or transmission.

An Example

One of the most common forms of XML in the GIS industry is the Federal Geographic Data Standard (FGDC). This standard defines a format for metadata for GIS layers. Below is the contents of an XML metadata file from ArcGIS that contains just the basic information when ArcGIS creates the file. I recommend you open the file in Notepad (or a similar editor) and you'll see that it is not very nicely formatted. Just add some tabs and you'll see the file has the structure below.

<metadata xml:lang="en">
	<Esri>
		<CreaDate>
			20111016
		</CreaDate>
		<CreaTime>
			10280500
		</CreaTime>
		<ArcGISFormat>
			1.0
		</ArcGISFormat>
		<SyncOnce>
			TRUE
		</SyncOnce>
		<ArcGISProfile>
			ItemDescription
		</ArcGISProfile>
	</Esri>
	<dataIdInfo>
		<idCitation>
			<resTitle>
				(empty)
			</resTitle>
		</idCitation>
	</dataIdInfo>
</metadata>

HTML

Normally, we would say that HTML is based on XML but it actually predates XML. XML was defined from HTML. Because of this, HTML does not require the standard XML header line. A very simple HTML page that just shows the text "Hello World", could appear as the following:

<html>
<body>
Hello World
</body>
</html>

Keyhole Markup Language (KML)

Below is a sample of a KML file:

<?xml version="1.0" encoding="UTF-8"?>
   <kml xmlns="http://earth.google.com/kml/2.1">
     <Placemark>
       <name>Simple  placemark</name>
       <description>Fort Collins</description>
       <Point>
          <coordinates>-105,40,0</coordinates>
       </Point>
     </Placemark>
 </kml>

This is the file format used by GoogleEarth.  If you save the text above into a text file with a “kml” extension you can then “drag” the file into the “Places” window in GoogleEarth and the point near Fort Collins Colorado will appear. Note that if you want to add more than one "Placemark" to the file, you need to place them inside a "Document" parent tag.

There is considerable information on KML files at:

http://code.google.com/apis/kml/documentation/index.html

Formatting

If you do write out XML files directly, please follow the formatting above, tabbing in, for the contents of each tag. This will make reading your XML much easier for others.