A FIRST TUTORIAL ON R text version of a document by, very likely, Chris Raphael with unauthorized changes by Don Byrd, early Feb. 2008 To get R: 1. Download R (it's free) from the website http://cran.r-project.org . There are versions for Macintosh, Linux, and Windows. 2. From the same website, click on "Manuals" (at the left, near the bottom) to find documentation. R can be used either to run pre-written programs or purely interactively, as a calculator. Try typing each of the following expressions -- except for the initial ">", R's command prompt -- to R (followed by the return key). Note: anything on a line following '#' is a comment and is ignored by R. > 5+3 > (12*7)+4 # No. of keys on a piano (7 octaves + 4 at the bottom) > log(81/80) > exp(20) > help("exp") # Just what is "exp", anyway? > exp(exp(exp(20))) # R is only human! :-) R has most any mathematical function you can think of: sqrt(), sin() ... mostly with easily guessable names. Expressions using the logical operators == != < > give _Boolean_ values, namely T or F: > 4>3 # this evaluates to T (true) > sqrt(4)==2 # so does this > sqrt(5)==2 # this evaluates to F (false) It is possible to have variables even when you use R as a calculator. Most strings beginning with an alphabetic character will be treated as variables. Try typing some of the following lines in succession; no need to include the comments, of course. Or, you can copy and paste them -- not a bad habit to get into. > x <- 3 # set x to 3 > y <- x*x+x > y # print the value of y > freq <- 440 * 2^((m-69)/12) # What does this do? Not much unless you set m first > m <- 60 > freq <- 440 * 2^((m-69)/12) # It works better this time > freq Vectors One of the nicest aspects of R is the way it handles _vectors_ (sometimes called one- dimensional arrays). However, it can be tricky, e.g., using vectors of different lengths together -- intentionally or otherwise! Here are several ways to create and use vectors: > xV <- seq(1,50) # xV is now the vector (1, 2, ..., 50) > xV <- 1:50 # same thing > yV <- seq(-pi,pi,length=50) # yV consists of 50 evenly spaced values from -pi to pi > sqV <- c(1,4,9,16) # c means "combine"; sqV is now the vector (1,4,9,16) > sumV <- xV+yV # vectors of same length can be added, multiplied, etc. > combV <- c(sqV, xV) # anything, even vectors of any length, can be combined > xVMag <- 4*xV # this is interpreted correctly too Random Number Generation Random numbers are useful for many things, especially in probability theory and statistics (R's original raison d'etre), as well as many areas of music informatics. R has a bunch of functions for generating them. There are many different _distributions_ of random numbers; the most important for us are the _uniform_ (all possible values are equally likely) and the _normal_ (which produces the Bell-shaped curve you've probably encountered before). R has lots of built-in functions for doing things with random numbers. For instance: > xV <- runif(100) # creates a vector of 100 (uniformly distributed) random numbers between 0 and 1. > punif(v) # is the probability that a Unif(0,1) random number is less than v There are similar functions for a variety of other distributions, including the normal(0,1) (rnorm,pnorm,qnorm), Exponential, Binomial, Poisson, Cauchy (rcauchy, pcauchy, qcauchy), and others. Subsets > xV <- runif(100) # creates a vector of 100 Unif(0,1) random numbers > xV[1] # the first element of xV > xV[c(1,3,5)] # a vector containing 1st, 3rd and 5th elements of xV > yV <- xV>.5 # a 100-long vector of Boolean values; yV[i] is T iff xV[i] > .5 > bigxV <- xV[xV>.8] # the "xV's" that are greater than .8 > bigxV Simple Graphs. Try the following. NB: if you do more than one "plot" at a time, you'll see only the last one! > xV <- seq(0,1,length=100) > yV <- xV^2 # yV = xV squared > plot(xV,yV) # plot with (xV[1],yV[1]) ... (xV[100],yV[100]) > plot(xV,yV,"l") # plot has lots of options: say help("plot") to find out > plot(yV,xV,"l") > plot(yV,xV,"s") > plot(yV) # same as plot(1:length(yV),yV) Source Files. You will need to write simple programs in R, and getting programs working almost always requires some trial-and-error iteration. It's not practical to do that just by typing statements to R; you have to write your programs in a text editor and save them in files. On a Mac, the easiest solution is probably to use R's built-in text editor. In the File menu, use New Document (or Cmd-N). As an added bonus, it knows the syntax of R, which helps in several ways (it automatically inserts matching braces and parentheses, highlights one when its "partner" is selected, etc.). However, you can also use a program like BBEdit or the free Text Wrangler. On Windows, you can use the Notepad, or Notepad2 or Textpad. (CAUTION! Do _not_ use a word processor like MS Word or Wordpad, or OS X's TextEdit, which -- despite its name -- is really a word processor. Word processors have several problems for writing programs; the worst is that unless you're very careful, you'll have a file with invisible formatting information that will cause syntax errors in R!) Suppose you create a file containing these lines: nDays <- 90 xV <- runif(nDays,-.5,.4) yV <- cumsum(xV) # yV[1] = xV[1], yV[2] = xV[1]+xV[2], etc. priceV <- 100*exp(yV) plot(priceV, type="l", xlab="day", ylab="sale price ($1000)") title("Average Home Value") print("Price history ($1000): ") print(priceV) Save it with the name "PriceCrash.r". To run it, type this in the R Console window: > source("PriceCrash.r") Important: that assumes that PriceCrash.r is in R's current "working directory"! If it's not, you must give the correct path to the file, for example: > source("/Users/donbyrd/Documents/WebSiteDon/Teach/RTools+Docs/PriceCrash.r") R understands the common abbreviation "~" for the user's home directory, so this works too: > source("~/Documents/WebSiteDon/Teach/RTools+Docs/PriceCrash.r") This technique allows you to write a program in the usual incremental way. If you want to get a hard copy of the printout and the plot (for example, to submit as your homework), do the following: > postscript("myplot.ps") # write plot in the postscript file "myplot.ps" > sink("myout.txt") # write text output to "myout.txt" > source("PriceCrash.r") # run the program you created > dev.off() # redirect plots back to screen. Don't forget this! > sink() # redirect text output back to screen. ditto. A Fun Example. Suppose two decks of cards are shuffled; then the cards are lined up side by side, and you count the number of places where the two decks have the same card. What is the probability that are no matches? This is a hard calculation to do, but you could estimate the probability by doing it many times and observing the proportion of times it occurs. Here's how. The 52 cards are represented simply by integers from 1 to 52. Each time through the loop, the program "shuffles" both decks, then counts the matches; if there aren't any, it adds one to the "no match" count. nTrials <- 1000 # number of trials: try nos. like 10, 100, 1000, 10,000 nZeros <- 0 # to count the number of times the "no match" event happens for (i in 1:nTrials) { xV <- 1:52; deck1V <- sample(xV, 52, replace=F) # a random permutation of the "cards" deck2V <- sample(xV, 52, replace=F) # another random permutation nMatches <- sum(deck1V==deck2V) # number of matching cards this time if (nMatches==0) nZeros <- nZeros+1 #cat("nMatches=", nMatches, "nZeros=", nZeros, "\n") } cat("estimated probability of no matches=", nZeros/nTrials, "\n") # the result This is also one of my R Example Programs, namely CardMatchingFunExample.r. One tricky thing is the statement "nMatches <- sum(deck1V==deck2V)"; what does that actually do? To find out, after running the program, you might tell R the following: > deck1V > deck2V > deck1V==deck2V > sum(deck1V==deck2V) This will give you an idea of what happened the last time through the loop. You can also un-comment the '#cat("nMatches="...' line to see what's happening _every_ time through, but you might want to reduce nTrials to a small number first! Working Directories Anytime you refer to a file -- either with the "source" command, to run a program, or to read data from or write data to a file -- R needs to know the directory or folder to use for the file. If you don't give a complete path, it uses the current _working directory_. Here's how to set the working directory or find out what it's set to. > setwd("~/Documents/WebSiteDon/Teach/RTools+Docs") > getwd() The source command has a nifty "chdir" option that temporarily changes to the directory containing the file being run: > source("~/Documents/WebSiteDon/Teach/RTools+Docs/FancyProgram.r", chdir=TRUE) Demos, Help, & Quitting > demo() # get info about built-in demos > help("rnorm") # gives information about the function rnorm. Of couse this works > # for other functions too, and even for operators. > q() # quit R. If it asks if you want to save the workspace, # just say no.