Input/Output: Text Data

Dr Andy Evans

[Fullscreen]

Input and Output (I/O)

  • So, how do we deal with files (and other types of I/O)?
  • In Java we use address encapsulating objects, and input and output "Streams".
  • Streams are objects which represent the external resources which we can read or write to or from. We don't need to worry about "how".
  • Input Streams are used to get stuff into the program. Output streams are used to output from the program.

Streams

  • Streams based on four abstract classes...
  • java.io.Reader and Writer : Work on character streams - that is, treat everything like it's going to be a character.
  • java.io.InputStream and OutputStream : Work on byte streams - that is, treat everything like it's binary data.

Character streams

  • Two abstract superclasses - Reader and Writer.
  • These are used for a variety of character streams.
  • Most important are:

    FileReader : for reading files.
    FileWriter : for writing files.

	
FileReader fr = null;
File f = new File("myFile.txt");

try {

    fr = new FileReader (f);

} catch (FileNotFoundException fnfe) {
    fnfe.printStackTrace();
}

	
try {

    char char1 = fr.read(); 
    // Read a char at a time
	
    fr.close(); 
    // Close the connection to 
    // the file so others can use it.

} catch (IOException ioe) {
    ioe.printStackTrace();
}

	
FileWriter fw = null;
File f = new File("myFile.txt");

try {
    fw = new FileWriter (f, true);
    //Note optional boolean sets whether 
    // to append to the file (true) or 
    // overwrite it (false). Default is overwrite. 
		
} catch (IOException ioe) {
    ioe.printStackTrace();
}

	
try {

    fw.write("A");
    fw.flush(); 
    // Make sure everything in 
    //the stream is written out.
	
    fw.close();

} catch (IOException ioe) {
    ioe.printStackTrace();
}

Buffers

  • Plainly it is a pain to read a character at a time.
  • It is also possible that the filesystem may be slow or intermittent, which causes issues.
  • It is common to wrap streams in buffer streams to cope with these two issues.
    	
    BufferedReader br = new BufferedReader(fr);
    BufferedWriter bw = new BufferedWriter(fw);
    
    
	
BufferedReader br = new BufferedReader(fr);
// Remember fr is a FileReader not a File.

// Run through the file once to count the lines 
// and make a String array the right size.
int lines = -1;
String textIn = " ";
String[] file = null;

try {
    while (textIn != null) {
		textIn = br.readLine();
		lines++;
    }
    file = new String[lines];

	

    // close and remake the file reader / buffer 
    // here to set it back to the file start.

	

    //Go back to the start of the file 
    // and read it into the array.
    for (int i = 0; i < lines; i++) {
		file[i] = br.readLine();
    }
    br.close();
} catch (IOException ioe) {}

	
BufferedWriter bw = new BufferedWriter (fw);
// Remember fw is a FileWriter not a File.

String[][] strData = getStringArray();

try{
    for (int i = 0; i < strData.length; i++) {
		for (int j = 0; j < strData[i].length; j++) {

		    bw.write(strData[i][j] + ", ");

		}
		bw.newLine();	
    }
    bw.close();
} catch (IOException ioe) {}

Processing data

  • This is fine for text, but what if we want values and we have text representations of the values?
  • There is a difference between 0.5 and "0.5".
  • The computer understands the first as a number, but not the second
  • First, parse (split and process) the file to get each individual String representing the numbers.
  • Second, turn the text in the file into real numbers.

java.util.StringTokenizer

  • 	
    String line = "Call me Dave";
    StringTokenizer st = new StringTokenizer(line);
    while (st.hasMoreTokens()) { 	
        System.out.println(st.nextToken()); 
    } 
    
    
  • prints the following output:
    
    Call
    me 
    Dave 
    
    
    Default separators: space, tab, newline, carriage-return character, and form-feed.

Processing data

  • There are wrapper classes for each primitive that will do the cast:
    	
    double d = Double.parseDouble("0.5");
    int i = Integer.parseInt("1");
    boolean b = Boolean.parseBoolean("true");
    
    
  • On the other hand, for writing, String can convert most things to itself:
    	
    String str = String.valueOf(0.5);
    String str = String.valueOf(data[i][j]);
    
    
	
for (int i = 0; i <= lines; i++) {
    file[i] = br.readln();
}
br.close();

	
double[][] data = new double [lines][];
for (int i = 0; i < lines; i++) {
    StringTokenizer st = 
				new StringTokenizer(file[i],", ");
                // Comma and space separated data

    data[i] = new double[st.countTokens()];
    int j = 0;
    while (st.hasMoreTokens()) { 		
		data[i][j] = 
		    Double.parseDouble(st.nextToken()); 
		j++;
    }
} 

	
BufferedWriter bw = new BufferedWriter (fw);
double[][] dataIn = getdata();
String tempStr = "";
try {
    for (int i = 0; i < dataIn.length; i++) {
		for (int j = 0; j < dataIn[i].length; j++) {

		    tempStr = String.valueOf(dataIn[i][j]); 
				    //Converts the double to a String. 
				
		    bw.write(tempStr + ", ");
		}
		bw.newLine();	
    }
    bw.close();
} catch (IOException ioe) {}

java.util.Scanner

Wraps around all this to make reading easy.
	
Scanner s = null; 
try { 
    s = new Scanner(
    new BufferedReader(
		new FileReader("myText.txt"))); 
    while (s.hasNext()) { 
		System.out.println(s.next()); 
    } 

    if (s != null) { 
		s.close(); 
    }
} catch (Exception e) {}

However, no token counter, so not great for reading into arrays.

Scanners

  • By default looks for spaces to tokenise on.
  • Can set up a regular expression to look for.
  • Comma followed by optional space:
  • 	
    s.useDelimiter(",\\s*"); 
    
    

Data conversion

	
s.next() / s.hasNext()  				String

nextBoolean() / hasNextBoolean()		boolean
nextDouble() / hasNextDouble()		double
nextInt() / hasNextInt()				int
nextLine() / hasNextLine()			String

  • If the type doesn't match, throws InputMismatchException.

Reading from keyboard

	
Scanner s = new Scanner(System.in);
int i = s.nextInt();
String str = s.nextLine();

Parsing Strings

  • Usually with text we want to extract useful information.
  • Search and replace.

String searches

  • startsWith(String prefix), endsWith(String suffix)
    Returns a boolean.
  • indexOf(int ch), indexOf(int ch, int fromIndex)
    Returns an int representing the first position of the first instance of a given Unicode character integer to find.
  • indexOf(String str), indexOf(String str, int fromIndex)
    Returns an int representing the position of the first instance of a given String to find.
  • lastIndexOf
    Same as indexOf, but last rather than first.

String manipulation

  • replace(char oldChar, char newChar)
    Replaces one character with another.
  • substring(int beginIndex, int endIndex), substring(int beginIndex)
    Pulls out part of the String and returns it.
  • toLowerCase(), toUpperCase()
    Changes the case of the String.
  • trim()
    Cuts white space off the front and back of a String.

Example

	
String str = "old pond; frog leaping; splash";
int start = str.indexOf("leaping");
int end = str.indexOf(";", start);
String startStr = str.substring(0, start);
String endStr = str.substring(end);
str = startStr + "jumping" + endStr;

  • str now "old pond; frog jumping; splash"

Review

  • Use a java.util.Scanner where possible.
  • Otherwise use a FileWriter/Reader.
  • But remember to buffer both.