Masthead

Strings are Made up of Characters

1. Strings

The strings you have been creating are made up of 8-bit (1-byte) characters from the American Standard Code for Information Interchange (ASCII). ASCII contains 256 characters. The first 128 characters include the alphabet used in the United States of America, numbers, and common punctuation characters. There are also special "control characters" that you'll be introduced to below. An additional 128 characters were added for the rest of the characters used in European languages and additional punctuation such as the degree symbol. Try searching the web for "ASCII" and you'll find a variety of references.

If you wish to create strings with characters for languages such as Chinese, you'll need use the next standard, Unicode, which can use 2-byte character codes and represent every written language commonly used on Earth.

2. Characters

Every string is made up of characters. Look down at the keyboard on your computer and almost every symbol you see (except for the function keys) can be represented by a character inside the computer. When you type, each key sends a "character code" into the computer which is added to the end of a string. You can get to these characters and manage each one individually when needed.

Strings are actually arrays of characters (we'll learn more about arrays a little later) and you can access each character but just specifying it's position. Try the following:

 LastName="Doe"
 OneChar=LastName[0]
 print("OneChar="+format(OneChar))

You should see the first character "D" appear. Notice that the first character is indexed by a "0" in brackets. When you access an array, the first position will be 0 rather than 1. This allows us to access arrays with numeric calculations property.

The characters we use are primarily based on the American Standard Code for Information Interchange or ASCII. The original ASCII standard contained only 128 characters (see Appendix A) and did not cover all of the characters used in the world. ASCII was first expanded to include all the character sets in Europe and additional symbols such as the degree symbol. This required 255 characters. Then, was expanded to the Unicode standard which does cover all the characters in the world (over 50,000) but includes ASCII as it's first 255 characters.

3. Special characters

There are a large number of special or "control" characters in ASCII. Fortunately, there are only a few that are commonly used today.

Note: Windows and UNIX based operating systems use different characters to terminate the end of a line in a text file! Windows uses a carriage return and a line feed character (CR + LF) while UNIX uses just a linefeed character (LF). The Mac OS before version X used just a carriage return (CR). You'll see this when you load a file into a text editor and the lines do not match up as they should. The way to get around this is to always write out the correct sequence of characters for the operating system you are working on but then treat any combination of CR and LF as a line ending when reading files.

End of line characters for different operating systems:

    MS-Windows: "\r\n"

    UNIX and Mac from OS X on: "\r"

Your keyboard does not contain all the characters you will want to use. You can specify any character (ASCII or Unicode) in Python by typing "\xhh" where "hh" is the hexadecimal value you want to use for the character. Below are some characters you may want to use in GIS programming:

Other special cases are when you want to include a double quote in a string that is double-quoted or a backslash. You can put a back slash in front of a double quote to include it in a double-quoted string. You can also include two backslashes to include a single back slash in a string.

© Copyright 2018 HSU - All rights reserved.