Introduction to R

 

> 2+3  # R can be used as a simple calculator
> exp(1)  # All the usual calculator functions are available
> pnorm(1.645)  # the normal probability function

 

The assignment operator is "<-"

> x <- 2  # assign the value 2 to x
> y <- 3  # y is 3
> z <- x+y  # assign the sum to z
> z  # look at the result

 

Vector and matrix valued variables are also possible. The c(), cbind() and rbind() functions are useful for building these up.

> x <- c(1,9,7,3)  # x is a length 4 vector
> x  # take a look
[1] 1 9 7 3

> y <- 1:4  # ":" is useful for assigning a sequence
> y  # take a look
[1] 1 2 3 4

> z <- x+y # vector arithmetic works as expected

> z # take a look
[1] 2 11 10 7
> xm <- cbind(x,y,z) # make a matrix by binding x, y, z together column-wise

> xm # take a look
     x y  z
[1,] 1 1  2
[2,] 9 2 11
[3,] 7 3 10
[4,] 3 4  7

> t(xm) # the transpose of xm

  [,1] [,2] [,3] [,4]
x    1    9    7    3
y    1    2    3    4
z    2   11   10    7

> rbind(x, y, z) # make a matrix by binding x, y, z together row-wise
  [,1] [,2] [,3] [,4]
x    1    9    7    3
y    1    2    3    4
z    2   11   10    7

 

Now let's look at getting some simple statistics:

> x <- rnorm(50) # generat 50 standard normal random numbers
> mean(x) # the mean
> var(x) # the variance
> summary(x) # 5 number summary plus the mean
    Min.   1st Qu.   Median     Mean  3rd Qu.     Max.
-1.95200  -0.57150  0.19750  0.06371  0.76080  2.35200

> stem(x) # a stemplot of x

 The decimal point is at the |
 -2 | 0
 -1 | 96
 -1 | 42110
 -0 | 9877665
 -0 | 443221
  0 | 0022223344444
  0 | 5578888899
  1 | 334
  1 | 5
  2 | 04

> y <- rnorm(50) # more random numbers
> cor(x,y) # the correlation of x and y

 

Now, let's do some nice graphics:

> hist(x) # a histogram of x

> plot(x,y) # a simple scatter plot


> plot(x,y,pch="+",xlab="The X axis",main="My plot") # add some more options to the plot

 

 

To find out more online use the "help()" command:

> help(cbind) # If you know the command you want help for

 

Lets see how we enter data into R.  Here is a data set for test. You can save it as a text file to a directory, say "c:/mydata/", with a file name, say "gala.data". 
> gala <- read.table("c:/mydata/gala.data") # read the data into R
Or, you can change the working directory to "c:/mydata" (do this by choosing "
檔案"->"變更現行目錄", and then inputting the directory name). And now you can simply read the data into R by:

> gala <- read.table("gala.data") # read the data into R
> gala # take a look

           Species  Endemics   Area  Elevation  Nearest  Scruz  Adjacent
Baltra          58        23  25.09        346      0.6    0.6      1.84
Bartolome       31        21   1.24        109      0.6   26.3    572.33
Caldwell         3         3   0.21        114      2.8   58.7      0.78
                                ... stuff deleted ...

Wolf            21        12   2.85        253     34.1  254.7      2.33

R stores whole datasets in dataframes. It's a convenient way of keeping all the variables together.

> dim(gala) # check the dimension of the data
[1] 30 7
 

 

You can select a variable from the data by:

> gala$Species # pick out a particular variable

 [1]  58 31  3  25   2  18 24  10  8  2 97 93 58 5 40 347 51 2 104
[20] 108 12 70 280 237 444 62 285 44 16 21

> gala$Sp # unique abbreviations work
 [1]  58 31  3  25   2  18 24  10  8  2 97 93 58 5 40 347 51 2 104
[20] 108 12 70 280 237 444 62 285 44 16 21

> attach(gala) # makes the prepending of gala unnecessary - see next line
> Species
 [1]  58 31  3  25   2  18 24  10  8  2 97 93 58 5 40 347 51 2 104
[20] 108 12 70 280 237 444 62 285 44 16 21

 

 

You can select a subset of the data by:

> gala[2,] # the second row
> gala[,3] # the third column
> gala[2,3] # the (2,3) element
> gala[c(1,2,4), ] # the first, second, and fourth rows
> gala[3:6,] # the third through sixth rows
> gala[,-c(1,2)] # "-" indicates "everything but", this keeps all data except the first two columns
           Area  Elevation  Nearest  Scruz  Adjacent
Baltra    25.09        346      0.6    0.6      1.84
Bartolome  1.24        109      0.6   26.3    572.33
... stuff deleted ...

Wolf       2.85        253     34.1  254.7      2.33
> gala[gala$Sp > 300,] # select those cases whose Species are greater than 300
           Species  Endemics     Area  Elevation  Nearest  Scruz  Adjacent
Isabela        347        89  4669.32       1707      0.7   28.1    634.49
SantaCruz      444        95   903.82        864      0.6    0.0      0.52

 

 

More details on R can be found herehere, or here.

 

The software RStudio provides a better integrated development environment than R does. You can download the free version of RStudio from here (Note that you need to install R before using RStudio).