R and Java


This practical we'll run through the same sequence as last session, but this time in R -- first stand-alone, and then programmatically. We'll comment any code that we give you, however, at the end we'll also give you some links to resources where you can start learning about R more fully.


First, we'll first make a simple scatterplot in R, before we add a regression line to it.

The best thing to do is to make an R script, rather than typing the commands directly into the RGui. R has a bit of a tendency to hang, so having the code in a separate script will save a great deal of tears. Open Notepad++ and save a blank file as script.r (our lab machines have an R-Script editor installed, but just use Notepad++ for the moment). We'll assume you do this in the directory m:\r-projects\. In addition, open a second blank file and type into it:

source("M:\\r-projects\\script.r")

Adjusting it if you used a different directory. This code isn't part of the script - it'll just save you having to type it again and again if R crashes. You can copy and paste this line, which runs your script, into the R prompt. Note that you can either use forward-slashes in filepaths, or escaped backslashes (see example below).

Once you've done this save this file: data.tab to the same directory as the script.r. This is the data from last practical, in a Comma Separated Variable (CSV) file. You can look at it if you like - just don't re-save it from Excel. For reference, it was generated in Notepad++, and saved with ANSI encoding (R is very fussy about this kind of thing; you can set the encoding of Notepad++ files using the Notepad++ Encoding menu).

Now go to your blank script.r and copy and paste in the following:

	# Simple linear regression script

	# Read in the data, and make it available as a loaded object
	# Note escaped back-slashes.
	data1 <- read.csv("M:\\r-projects\\data.tab", header = TRUE)
	attach(data1)
	# Next line doesn't really do anything here, but is useful later in java.
	data1

	# Start a plot and plot the data
	plot(Age, Desperation, main="Age vs. Desperation")
	
	# Regression code will go here
	
	# Cleanup
	detach(data1)

Adjust the file path if necessary and save this file, then go to the R prompt. On our machines, R is installed under the "Statistics" group of programmes. You want the 32bit "i386" version. Once open, paste in the source line from your other text file. Hit Enter/Return to run the line and script.

You should see the data as a scatterplot. Close down the plot and we'll add the regression code.


Add the following code to script.r where the code above says # Regression code will go here:

	# Linear regression
	lineeq <- lm(Desperation ~ Age, data=data1)

	# Generate a sequence of numbers across the range of the data.
	# We could just use "Age", but this copies what we did last practical.
	x <- seq(min(Age), max(Age), by=10.0)

	# Make a new data frame containing the new data in a column called "Age" like before.
	# Easier if name the same as x-axis data in original.
	newData <- data.frame(Age = x)

	# Predict new y-axis datapoints using the new x-axis data and the regression formular.
	predictions <- predict(lineeq, newdata = newData)

	# Add the new data to the plot as a line.
	lines(Age, predictions)

Also, add the following at the script end, just to clean up a bit.

	# Clear used objects from memory.
	rm (data1, lineeq, newData, predictions, x)

Provided R hasn't crashed on you, you should just be able to use the "Up" key to get back your source command so you can run it again.

Have a good look at the script; it does exactly what we did last practical. In actual fact, we don't normally need to go to the lengths of constructing a new series with R: abline(lineeq) would take in the line equation holding object and display it on the current plot, so we don't really need all the code after the linear regression. However, as it stands the code not only mirrors last practical, but also makes things a bit easier when we get to the java bit of the practical.


We'll come back to some resources that will help you understand the code above at the end. For now, just look at the comments. When you think you understand how it works, roughly, go on to Part Two where we'll look at running this code from Java.