Data storage
[Part 2 of 11]


So, let's get to the nitty-gritty! How do we fill in that gap in "main" to get jobs done? Most code is made up of three things: variables (which hold data and other stuff), operators (which use those variables to work stuff out), and flow control statements, to control the order things are done in (and here we might include things like mouse-clicks). There is also code for dealing with the outside world through hardware, but we won't talk about that so much in this course.

The first half of the course will look at these core language elements, while the second half will look at getting specific jobs done using them, together with good coding practice and libraries of pre-written code.

We'll start by looking at variables; storing a few basic elements in these slides, then storing more complex information, and finally storing larger amounts of data.


Further info:

You can find a full list of operator precedences in this tutorial. You can find a full list of primitive data types in this tutorial.

Memory is where computers store information. When we talk about memory and Java, we are usually talking about Random Access Memory (RAM) which is quickly available for programs to use and is build into chips inside the computer.

Note that the "+" in things printed can be confusing. So, for example:

x = 100
System.out.println("x = " + x);

prints x = 100, but:

System.out.println("x = " + x + 2);

prints x = 102, that is, it joins things when it expects text (like after a chunk of text), but adds if expecting arithmetic.

Have a quick go at changing your HelloWorld.java so it prints out the result of some mathematics.

 


Quiz:

   int i = 51;
   double j = i / 2;
   System.out.println("j = " + j);

prints out ________________________________________

  1. j = 1.0
  2. j = 25
  3. j = 25.0
  4. j = 25.5

Correct! The variables on the right of the equation are all int values, so the mathematics is done as integer mathematics, losing the fractions. This gives an answer of "25", which is then implicitly cast to 25.0 by the JVM because it's attached to a double label "j".


Further ideas / info:

Open up your text editor, make a new file, and copy in this code:

	
public class Point {
	double x = 100.0;
	double y = 200.0;
}

Save it with the right name (Point.java) in the same directory as HelloWorld.java, then try:

  1. Compiling both classes using javac *.java
  2. Making a Point object called p1 in HelloWorld, and recompiling.
  3. Doing some mathematics in HelloWorld using p1.x and p1.y

 

Note that usually we try to call variables something useful that indicates their type and purpose. Here's we've called the Point variable p1 incase we have multiple Points, but if we thought we were just going to have one, this kind of naming would be usual:

Point point = new Point();
point.x = 100.0;

You might imagine calling the variable the same as the class would be confusing, but actually, as long as you keep classes starting with upper case letters, and variables with lower case letters, it isn't confusing at all.


Quiz:
Given these three classes, complete the main GIS class:

public class Point {
   double x = 100.0;
}

public class PointHolder {
   Point p1 = new Point();
}

public class GIS {
   public static void main (String args[]) {

      PointHolder ph = new PointHolder();
      ________________________________________

   }
}

  1. System.out.println(p1.x);
  2. System.out.println(x);
  3. System.out.println(ph.p1.x);
  4. System.out.println(gis.ph.p1.x);

Correct! The GIS class, where we're writing the code, contains a PointHolder object, which we need to look inside to find its Point object, which we can then look inside for the x variable. Given this, and the fact that we're already inside the GIS class, the sequence is: ph.p1.x.


Further options and info:

It is always worth thinking carefully about the size of multi-dimensional arrays of over two dimensions. To see why, let's think about the memory used by arrays with sides of 100.

If we store a single int in a variable, it takes up 4 "bytes" of data space. Each byte is 8 "bits" - binary ones or zeros, so an int takes up 4*8 = 32 bits of space.

int [] intArray = new int [100];
Takes up 3,200 bits, or 400 bytes.

int [][] intArray = new int [100][100];
Takes up 320,000 bits, or 40,000 bytes.
There are 1024 bytes in a RAM memory kilobyte (strictly a kibibyte), so 40000 is ~39 KB.

int [][][] intArray = new int [100][100][100];
Takes up 32,000,000 bits, or 3906.25 KB.
There are 1048576 bytes in a RAM memory megabyte (strictly a mebibyte), so 3906.25 KB is ~3.8 MB.

int [][][][] intArray = new int [100][100][100][100];
Takes up 3,200,000,000 bits, or ~381.5 MB.
There are 1073741824 bytes in a RAM memory gigabyte (strictly a gibibyte), so 381.5 MB is ~0.37 GB.

int [][][][][] intArray =
      new int [100][100][100][100][100];

Takes up 320,000,000,000 bits, or ~37 GB.

A 32-bit Windows 7 machine can only use 4 GB of memory, so you can see that it would be impossible to hold a 5D array with a 100 int side on such a machine, and an array of 100 int sides would be pretty small. In actual fact, because the operating system, JVM, and other programs use memory, typically you might have 1.5 GB available if you're very lucky. A typical small satellite image of 1000 int side would take 3906.25 KB per image. If you have three bands of data stored separately this is ~0.011 GB per image, meaning you can work with ~136 images in a third dimension.

As you can see, working in more than ~3 dimensions requires considerable thought about the space you might use, especially when your arrays contain complicated objects, which would take much more memory space than an int. There are solutions to having limited memory space (such as temporarily writing to files or re-writing your code to use multiple computers) but none of them are ideal.

 


Quiz:

Fill in the correct missing line:

   double[][] arr2D = new double[3][2];
   arr[0][0] = 30.0;
   arr[0][1] = 20.0;
   ______________________________
   arr[1][1] = 50.0;


  1. arr[2][2] = 30.0;
  2. arr[1][2] = 30.0;
  3. arr[2][1] = 30.0;

Correct! The other two answers break the program because the second dimension is size 2, i.e. its index can only be 0 or 1. Both other answers try to place something in position 2, i.e. the third position in a dimension with only two spaces.

 


[Key ideas from this part]

Remember: If you go to this page and then come back here, your quizzes will reset, so you might want to open this in a new tab.