Core Packages:
File Input / Output
[Part 8]


For the rest of the course we'll be looking at the packages that come with the JDK, and how we can use them to do specific jobs. We'll start with input and output, that is, getting data into our programs and writing data out. In particular, we'll concentrate on reading and writing files.


Powerpoint and audio

Further info:

Note that you can actually set up FileDialog to be either an 'Open' or 'Save' dialog, using final static variables setup inside the class:
new FileDialog(new Frame(), "Save", FileDialog.SAVE);
new FileDialog(new Frame(), "Open", FileDialog.OPEN);
Have a look at the Docs for FileDialog and see how this is described in them, as this kind of thing is quite common.

Details of Unicode and ASCII, along with a list of escape codes.

It is worth noting that binary files cover more than just text data in a compressed format; the format covers a wide range of file types, from mapping formats like Shapefiles though JPEG-compressed files, older Word files, and MPEG-compressed music and video. One of the chief uses for byte streams in Java is reading image and movie files, which still require more compression than other data formats because, in part, of the need for speed in dealing with them.

Finally, it is also worth noting that if you run the code to find the application directory from inside an IDE, you can sometimes get odd results because of the way IDEs deal with loading classes. If it is giving you odd results, check it at the command line, and make sure you package up any files based on that rather than the IDE results. In general, it should return the location of the calling class' ".class" file - even if this is inside a "jar" file.


Quiz:

You can find an excellent list of all of the Unicode characters currently assigned at Fileformat.info. This lists all the major Unicode character blocks, from 128 bit Latin through Runic to Alchemical Symbols, including each character (click on blocks -> List with images), and, if you have a font set that supports them, the escape code you can also use to represent them in Java.

Given this, the correct java escape code for the Unicode character for the Arabic character "Alef", is _______________.


  1. \u05d0
  2. \u0904
  3. \u0627
  4. \u2df6

Correct! It's \u0627; I'm sure you instantly recognised the others:
\u05d0 is the Hebrew "Alef"
\u0904 is the Devanagari letter "Short A"
\u2df6 is the Combining Cyrillic letter "A".

The key point to remember is that even though we can embed these within text using java, the user still has to have a font installed that supports them. Language packs are available for a wide variety of languages, on most operating systems. However, the more obscure symbols, like Phaistos Disc symbol "bee" or the symbol for a "cheery snowman" are sadly unlikely to be supported.


Further info:

Note that we've played fast and loose in these slides with the try-catch constructions to save space, and, occasionally, we've missed them out altogether. In reality, given that most file operations throw exceptions, you should put multiple try-catch blocks around specific commands, and, moreover, you should use finally blocks to close streams, more strongly guaranteeing that it happens. See the "Key Ideas", below, for an example.

The other thing to note is that in text files the ends of lines are marked using invisible characters, originally built into ASCII. However, different operating systems mark line endings in files differently. For example, Windows marks ends of lines with a line feed character ('LF': ASCII=10 Escape=/n) and a carriage return ('CR': ASCII=13 Escape=/r), that is, /n/r. This is because old printers used to need to be told to move the paper on a line, and then return the print head to the start of the line. UNIX-like systems, including Macs, just use a line feed. This can result in strange issues when files are moved from Windows to UNIX systems, though both operating systems and file transfer software is getting a lot better at converting between them, especially since people started regularising around Unicode. Java can read any combination, but when writing files it is best to use built in java methods rather escape characters (these are ok for displaying on the screen though), as these methods will write whatever is most appropriate for the system. Methods that do this include BufferedWriter's newLine method. You can also get the system character sequence as a String using System.getProperty("line.separator"). More info on the wild varieties of different 'EOL' characters can currently be found in this Wikipedia article.

Powerpoint and audio


Quiz:

The compiler will report one error for the following code which will stop it both compiling and running:

try {
   BufferedReader br = new BufferedReader(new FileReader(new File("a:/data.txt")));
} catch (FileNotFoundExceplion fnfe) {}

The problem is _______________________________________.

  1. that it needs more try-catch constructions
  2. that the File and FileReader need labels
  3. that the stream needs closing
  4. that FileNotFoundExceplion is the wrong exception
  5. that the scope of the BufferedReader is pointlessly limited
  6. that there needs to be something in the catch block

Correct! Though you'd have to be a pretty close reader it spot it. There's a typo - FileNotFoundExceplion should be FileNotFoundExceplion. The compiler returns the error message 'cannot find symbol', with a caret pointing at the start of the word. This is usually indicative that either the name is misspelled or you've forgotten an import statement for the package (or both). Otherwise, none of the other issues listed will stop the code compiling or running. You don't need any more try-catch blocks here, as the only thing that explicitly throws anything is the constructor for FileReader. Neither FileReader nor the File need labels as we don't use them again -- closing the BufferedReader, for example, would close the stream as far as the FileReader is concerned. In fact, we might as well not have labels, as they take time to create. The stream does need closing, the scope is very poor, and there should be something in the catch block, but none of these will stop the code running -- they're just poor coding.


Powerpoint and audio

Further info:

Binary files are actually pretty painful, which is why most people are glad we've moving to more and more text formats. The biggest problem is finding out what the internal format of the files is. In a lot of cases these are proprietary information, and in some cases protected from reverse engineering by EULA and national anti-piracy / copyright-protection legislation. Even where it is legal, it is often hard to work out the sequence of bits and what they mean. When the files aren't produced by the simple Java methods, this can be made even harder by the fact that different formats and programming languages have different binary setups for the major data types and/or they use their own data types. Just for reference, and to show something of the complexities, Java has a 32 bit integer data type which is Bigendian and uses the Two's Complement mechanism of storing negative numbers.

 

It's this kind of thing that makes text formats like XML so pleasant. We won't go into HTML (the language of webpages and java docs) or XML (a more generic version) on this course. However, if you are interested, here are two brief tutorials:

HTML
XML


Quiz:
Look at the docs for java.io.BufferedReader.

When it hits the end of a file, the class' readLine method returns ____________________________.

 

  1. null
  2. -1
  3. 11111111111111111111111111111111
  4. "You've won"

Correct! -1 is returned by byte readers at the end of a file, and all-the-ones is the binary representation of this in Java (see Two's Complement link above) -- if you see this, you definitely haven't won.


[Key ideas from this part]
[Homepage]