Writing Scripts
You're going to get tired of writing the same commands over and over again in the console pretty quickly. Fortunately, you can write scripts in RStudio and then execute all or part of a script very easily.
Below, you'll learn to use the basic R commands that will allow you to work with individual values (or "scalar" values) and data that is stored in vectors and data tables. Data tables are key to spatial analysis as this is how we work with point datasets in R.
The Script Panel
In RStudio, go to the "File" menu and select "New" and then "Script". You'll see a new, blank, script appear.
To start, enter some of the code we have been working with into the script like the following:
x=12 y=34 z=x*y
As you type the code, it is not executed right away. Select the lines and then press "Run" at the top of the panel. All three lines of the script should run and the output will appear in the console. In this way, you can select just the lines of code you are working on and execute them. If there is a problem, you can change the line with the problem and run it again. From now on, you'll want to work almost entirely with scripts.
Take a look at the "Workspace" panel in the upper left of RStudio. This panel shows the objects that are currently defined in RStudio. You can select "Clear All" to remove all the objects.
In the lower right of RStudio is the panel that will show your plots. You can see the different plots you've generated by selecting the back and forward arrows. You can also clear all your plots by clicking "Clear All".
You can also clear the Console panel by typing <ctrl> L while the panel is selected.
Note: When you close RStudio, it will save your "workspace" including all your open files, even if you have not saved them before.
You've already seen that you can just send R the name of an object and it will print it to the console. You can also print something to the console with the "print()" command. This will come in handy later. Try writing some code like the following and execute it.
x=100 print(x)
Comments
It's just as important to add comments to R code as any other software. If you add a pound sign (#) to a line, everything after the # is ignored and can be a comment.
x=12 # this is a comment
You can also use # to add header blocks to your code. Each script should start with a header block with; the author (you), the date it was writing, and a description of what the code does.
############################################### # Script to do some really neat stuff # Author: Jim Graham # Date: 4/16/2013 ###############################################
Vector
As you've seen before, a vector in R is a linear sequence of values. These values can be integers, floating point, strings, or other vectors. A matrix adds dimensions to a vector to allow it to be used as a multidimensional vector. A list is a vector that contains other vectors.
Note: In R, the basic data type is called a "Vector". This can be a number of different things and in most languages would be called an "Array". You will see vectors called arrays from time to time.
TheVector=1:12 # create a vector with entries 1,2,3,4...12 TheVector=c(12,4,3,1.23) # create a vector with the entries shown
You can access the values in a vector using brackets ("[]"). Note that the first element in a vector is indexed with "1", not zero.
x=TheVector[1] # get the first value in the vector SubVector=TheVector[1:2] # get two entries from the vector, starting at 1 NewVector=TheVector[-1] # get a new vector with all the entries from "TheVector" except the first one
You can "concatenate" multiple vectors and scalar values together to make new vectors.
TheVector3 = c(TheVector1,0,TheVector2) # Concatenates the vectors together into a new vector
There are a number of functions that operate on vectors.
NumEntries=length(TheVector) # return the number of entries in TheVector print(summary(TheVector)) # provide a summary, including min, max, and mean, for a vector
The "summary(...)" function is very handy and works on most data types in R to provide detailed information on an object.
Subset vectors based on conditions
You can subset a vector by first creating a vector with Boolean values (TRUE or FALSE) and then using that vector to sample the origina vector. The code below creates a Boolean vector from the condition "Xs>40" and then uses that vector to sample the Xs vector and pull out the values that are only above 40.
BooleanVector=Xs>40 # creates a vector with TRUE and FALSE based on the condition Test=Xs[BooleanVector,] # subsets the rows based on the BooleanVector
Matrices
A matrix in R is a vector that has a dimension added to it.
TheDimension=c(3,4) # create a dimension for the matrix TheMatrix=array(TheVector,TheDimension) # create a new matrix variable with the vector and the dimensions
You can also add a dimension to an existing vector as follows:
dim(TheVector) = c(3,4,2)
Matrices can use the same arithmetic operators as scalar values and vectors. Matrices also have special functions (see the R reference for more information).
Lists
Lists in R contain a sequence of elements but those elements can be of different types, including vectors. The following code shows how to create a list that has vectors as it's elements.
Vector1=c(2,3,6,3) Vector2=c("Dorthy","Scarecrow","Lion","Toto") TheList=list(Vector1,Vector2)
Data Frames
Data frames in R are tables with columns, rows, and names for each column. R implements data frames as a list where each entry in the list is a column in the table. You can create data frames directly from text files and from R data files.
HousePrice = read.table("houses.data") Test=read.table("C:/Temp/Table.txt", header=T) # tabular data with a header TheData = read.csv('C:/ProjectsR/Clustering/TwoClusters.csv')
We use data frames extensively for spatial analysis in R as this is how we load tables of points, typically from CSV files, into R.
Once you load a data frame into R, you can access the columns of the data frame using the dollar sign ("$") symbol.
MaxValue=max(TheDataTable$Elev) # return the maximum value from the column "Elev"
Definitions
As a reminder, the following definitions are provided to show the relationship between vectors, matrices, lists, and data frames.
Vector: n elements in a linear set
Matrix: a Vector with dimensions added
List: a Vector that contains vectors as it's elements
Data Frame: a List where all the vectors (columns) must have the same length and the columns and rows must have names.
Factors
Categorical data
Note: This section is incomplete.
Special Numbers
There are special numbers available in R. "NA" is used as "NULL" in most other languages and means a numeric value is "not available". This could be the result of a mathematic function where the result is not defined.
NA # not available is.na(x) # tests if a value is NA
Numeric values can also be "infinite".
Inf # infinite
Mode (Type) Conversion
There are a large number of object types available in the R base package. The "as" functions create new objects and convert between object types. Below are a few examples:
x = as.character(123) x = as.interger("123") x = as.double(123)
Additional Resources
R Base Package - all the functions in the base package
Debugging With RStudio - RStudio has a rather unique debugger but this is a good tutorial for it.