GIS Logo GSP 118 (318): GIS Programming

Parsing Strings

In Review

1. Introduction

We've seen how to create or "format" strings before. "Parsing" is when we take strings apart. This is crucial for accessing data that is available in files and web services. Strings are parsed by calling "string functions". We've already used one string function, "format". There are a large number of string functions built into Python to help with formatting, parsing, and searching strings.

You can do back to section "3.3 Formatting Strings" when you need a refresher on how to use indexes to create subsets of strings.

2. Parsing

When we read data from a text file or a web service, the information we want is usually buried within other text. When you created your text files you took values and put them into a tab-delimited or comma-separate file. If you want to get to the data again, you need to parse the text to find what you are interested in.

There are a wide variety of formats for text and a large number of functions for parsing them. However, one of the most common, and easiest to work with, are the text files we just created. The "split()" function will break up a string into individual elements based on a "delimiter" like a tab or comma character. Let's try this first in Python with a string you define:

TheString="Rock,Sand,Shale"
TheElements=TheString.split(",")
print(TheElements) # print the elements in the string
print(TheElements[1]) # print the second element in the string

3. Finding and Sub-setting Strings

One of the most common tasks will be finding strings in other strings. Let's say we have a string that contains a coordinate value in Degrees Minutes and Seconds. But we want to pull out just the degree from the string. We can use the "find()" function to find an index into the string and then use the list indexing approach to pull out just the degree portion. We'll need to use the special character "\xf8" for the degree symbol. If you check an ASCII chart you'll see that the degree symbol is at hexadecimal "f8" on the chart.

Try the code below and then add another "find()" to get the single quote that is after the minute in the coordinate and see if you can put it out.

TheCoordinate="40\xf8 21' 32\" E, 105\xf8 30' 40\""  
print("Coordinate="+TheCoordinate) # print the coordinate so we can see it before the conversion
EndOfDegree=TheCoordinate.find("\xf8")
TheDegree=TheCoordinate[:EndOfDegree]
print(TheDegree)

After you pull out the elements of a string you may find that there is "white space" on one side or the other of the string. While space includes spaces, tabs, carriage returns, and new line characters. It's a good idea to use Python's String function "strip()" to remove any unwanted white space. Add the code below to the code you entered from above.

TheDegree=TheDegree.strip()
print(TheDegree) 

4. Using The Source Assistent

The last thing you'll need to know about parsing strings is that the "find()" function can take additional parameters including a starting point within the string. This way, when you want to find the next occurrence of the string you can use:

EndOfNextDegree=TheCoordinate.find("\xf8",EndOfDegree+1)

This is also a good time to introduction you to the "Source Assistant in the Wing IDE. Enter the code below but stop after entering the word "find".

TheCoordinate="40\xf8 21' 32\" E, 105\xf8 30' 40\"" 
EndOfDegree=TheCoordinate.find

You should see a tab labeled "Source Assistant" at the right of the Wing IDE. You may need to adjust the size of the panel to see the contents of this tab. You should see something like the documentation below:

Symbol: x.find
 Likely type: builtin method str.find
 def str.find(self, sub, start=None, end=None)
 http://docs.python.org/library/stdtypes.html#str.find
 S.find(sub [,start [,end]]) -> int
 
Return the lowest index in S where substring sub is found, such that sub is contained within s[start:end].  
Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

This is a full and exact definition for the function find(). Take a look at the last line of the first section of the definition. This shows you the parameters that can be passed into the find() function. The parameters inside brackets ("[...]") are optional. For find(), you can have to specify a "sub" or substring to search for. Then, you can add a "start" as an index into the string to start searching. Finally, you can add an "end" index to stop the function from searching too far into the string. The "-> int" indicates that the function returns an integer value.

If you keep reading, you'll see some text that describes what the function does and finally that it returns "-1" on failure (i.e. if it does not find the specified string).

This is an incredibly valuable tool to learn exactly what you can do with functions. You can even click on the "http" link provided to jump to the Python.org documentation for the "find()" function.

Additional Resources

Python Documentation: String functions