Input/Output

Dr Andy Evans

[Fullscreen]

Builtins

  • Input: Gets user input until the ENTER key is pressed; returns it as a string (without any newline). If there's a prompt string, this is printed to the current prompt line.
    input(prompt)

Standard input/output

  • Input reads stdin (usually keyboard), in the same way print writes to stdout (usually the screen).
  • Generally, when we move information between programs, or between programs and hardware, we talk about streams: tubes down which we can send data.
  • Stdin and stdout can be regarded as streams from the keyboard to the program, and from the program to the screen.
  • There's also a stderr where error messages from programs are sent: again, usually the screen by default.

Standard input/output

  • You can redirect these, for example, at the command prompt:
  • Stdin from file:
    python a.py < stdin.txt
  • Stdout to overwritten file:
    python a.py > stdout.txt
  • Stdout to appended file:
    python a.py >> stdout.txt
  • Both:
    python a.py < stdin.txt > stdout.txt
  • You can also pipe the stdout of one program to the stdin of another program using the pipe symbol "|" (SHIFT-backslash on most Windows keyboards)

Open

  • Reading and writing files is a real joy in Python, which makes a complicated job trivial.
  • The builtin open function is the main method:
    f = open("filename.txt")
    for line in f:
        print(line)
    f.close()

    f = open("filename.txt")
    # Whole file as string
  • Note the close function (not listed as a builtin). This is polite - it releases the file.

Open

  • To write:
    a = []
    for i in range(100):
        a.append("All work and no play makes Jack a dull boy ");
    f = open("anotherfile.txt", 'w')
    for line in a:
        f.write(line)
    f.close()

Line endings

  • With "write" you may need to write line endings.
  • The line endings in files vary depending on the operating system.
  • POSIX systems (Linux; MacOS; etc.) use the ASCII newline character, represented by the escape character \n.
  • Windows uses two characters: ASCII carriage return (\r) (which was used by typewriters to return the typing head to the start of the line), followed by newline.
  • You can find the OS default using the os library: os.linesep
  • But generally if you use \n, the Python default, Windows copes with it fine, and directly using os.linesep is advised against.

Seek

Binary vs text files

  • The type of the file has a big effect on how we handle it.
  • There are broadly two types of files: text and binary.
  • They are all basically ones and zeros; what is different is how a computer displays them to us, and how much space they take up.

Binary vs. Text files

  • All files are really just binary 0 and 1 bits.
  • In 'binary' files, data is stored in binary representations of the basic types. For example, here's a four byte representations of int data:
    00000000 00000000 00000000 00000000 = int 0
    00000000 00000000 00000000 00000001 = int 1
    00000000 00000000 00000000 00000010 = int 2
    00000000 00000000 00000000 00000100 = int 4
    00000000 00000000 00000000 00110001 = int 49
    00000000 00000000 00000000 01000001 = int 65
    00000000 00000000 00000000 11111111 = int 255

Binary vs. Text files

  • In text files, which can be read in notepad++ etc. characters are often stored in smaller 2-byte areas by code number:
    00000000 01000001 = code 65 = char "A"
    00000000 01100001 = code 97 = char "a"

Characters

  • All chars are part of a set of 16 bit+ international characters called Unicode.
  • These extend the American Standard Code for Information Interchange (ASCII) , which are represented by the ints 0 to 127, and its superset, the 8 bit ISO-Latin 1 character set (0 to 255).
  • There are some invisible characters used for things like the end of lines.
    char = chr(8) # Try 7, as well!
    print("hello" + char + "world")
  • The easiest way to use stuff like newline characters is to use escape characters.
    print("hello\nworld");
  • Note that for an system using 2 byte characters, and 4 byte integers:
    00000000 00110001 = code 49 = char "1"
  • Seems much smaller - it only uses 2 bytes to store the character "1", whereas storing the int 1 takes 4 bytes.
  • However each character takes this, so:
    00000000 00110001 = code 49 = char "1"
    00000000 00110001 00000000 00110010 = code 49, 50 = char "1" "2"
    00000000 00110001 00000000 00110010
    00000000 00110111 = code 49, 50, 55 = char "1" "2" "7"
  • Whereas :
    00000000 00000000 00000000 01111111 = int 127

Binary vs. Text files

  • In short, it is much more efficient to store anything with a lot of numbers as binary (not text).
  • However, as disk space is cheap, networks fast, and it is useful to be able to read data in notepad etc. increasingly people are using text formats like XML.
  • As we'll see, the filetype determines how we deal with files.
  • Options for Open:
    f = open("anotherfile.txt", xxxx)
    Where xxxx is (from the docs):
Character:
"r"
"w"
"x"
"a"
"b"
"t"
"+"
"U"
Meaning:
open for reading (default)
open for writing, truncating the file first
open for exclusive creation, failing if the file already exists
open for writing, appending to the end of the file if it exists
binary mode
text mode (default)
open a disk file for updating (reading and writing)
universal newlines mode (deprecated)
The default mode is "r" (open for reading text, synonym of "rt"). For binary read-write access, the mode "w+b" opens and truncates the file to 0 bytes. "r+b" opens the file without truncation.

Reading data

  • The following is the most flexible and detailed way of reading text files:
    f = open("in.txt")
    data = []
    for line in f:
        parsed_line = str.split(line,",")
        data_line = []
        for word in parsed_line:
            data_line.append(float(word))
        data.append(data_line)
    print(data)
    f.close()

Open

  • Full options:
    open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
  • buffering: makes the stream of data more consistent, preventing hardware issues interfering with the process. Generally the default works fine but you can control bytes read in for very large files.
  • encoding: the text file format to use; the default UTF-8 is fine in most cases.
  • errors: how to handle encoding issues, for example the lack of available representations for non-ASCII characters.
  • newline: controls the invisible characters written to mark the end of lines.
  • closefd: whether to remove the file ~link when the file is closed.
  • opener: option for creating more complicated directory and file opening.
  • For more info, see: https://docs.python.org/3/library/functions.html#open

With

  • The problem with manually closing the file is that exceptions can skip the close statement.
  • Better then, to use the following form:
    with open("data.txt") as f:
        for line in f:
            print(line)
  • The with keyword sets up a Context Manager, which temporarily deals with how the code runs. This closes the file automatically when the clause is left.
  • You can nest withs, or place more than one on a line, which is equivalent to nesting.
    with A() as a, B() as b:

Context managers

  • Context Managers essentially allow pre- or post- execution code to run in the background (like file.close()).
  • The associated library can also be used to redirect stdout:
    with contextlib.redirect_stdout(new_target):
  • For more information, see:
    https://docs.python.org/3/library/contextlib.html

Reading multiple files

  • Use fileinput library:
    import fileinput
    a = ["file1.txt", "file2.txt", "file3.txt", "file4.txt"]
    b = fileinput.input(a)
    for line in b:
        print(b.filename())
        print(line)
    b.close()
  • https://docs.python.org/3/library/fileinput.html
  • Writing multiple files
    import fileinput
    a = ["file1.txt", "file2.txt", "file3.txt", "file4.txt"]
    b = fileinput.input(a, inplace=1, backup='.bak')
    for line in b:
        print("new text")
    b.close()
    '''
    inplace = 1 # Redirects the stout (i.e. print) to the file.
    backup ='.bak' # Backs up each file to file1.txt.bak etc. before writing.
    '''

Easy print to file

  • Print includes an option to redirect stout to a file:
    print(*objects, sep='', end='\n', file=sys.stdout, flush=False)
  • Prints objects to a file (or stout), separated by sep and followed by end. Other than objects, everything must be a kwarg as everything else will be written out.
  • Rather than a filename, file must be a proper file object (or anything with a write(string) function).
  • Flushing is the forcible writing of data out of a stream. Occasionally data can be stored in a buffer longer than you might like (for example if another program is reading data as you're writing it, data might get missed is it stays a while in memory), flush forces data writing.