Dealing with filesystems

Dr Andy Evans

[Fullscreen]

OS

  • The os module allows interaction with the Operating System, either generically or specific to a particular OS.
    https://docs.python.org/3/library/os.html
  • Including:
    • Environment variable manipulation.
    • File system navigation.

Environment Variables

  • These are variables at the OS level, for the whole system and specific users.
  • For example, include the PATH to look for programs.
    os.environ
  • A mapping object containing environment information.
    import os
    print(os.environ["PATH"])
    print(os.environ["HOME"])
  • For more info on setting Env Variables, see: https://docs.python.org/3/library/os.html#os.environ

OS Functions

  • A variety of methods for dealing with the filesystem:
    os.getcwd() # Current working directory.
    os.chdir('/temp/') # Change cwd.
    os.listdir(path='.') # List of everything in the present directory.
    os.system('mkdir test') # Run the command mkdir in the system shell

OS Walk

  • A useful method for getting a whole directory structure and files is os.walk.
  • Here we use this to delete files:
    for root, dirs, files in os.walk(deletePath, topdown=False):
        for name in dirs:
            os.rmdir(os.path.join(root, name))
        for name in files:
            os.remove(os.path.join(root, name))

pathlib

  • A library for dealing with file paths:
    https://docs.python.org/3/library/pathlib.html
  • Path classes are either "Pure": abstract paths not attached to a real filesystem (they talk of path "flavours"); or "Concrete" (usually without "Pure" in the name): attached to a real filesystem. In most cases the distinction is not especially important as the main functions are found in both.

Constructing paths

  • For example:
    p = pathlib.Path('c:/Program Files') / 'Notepad++'
  • Uses forward slash operators outside of strings to combine paths "/".
    p = pathlib.Path('c:/Program Files') / 'Notepad++'
  • Though more platform independent is:
    a = os.path.join(pathlib.Path.cwd().anchor, 'Program Files', 'Notepad++')
    #See next slides for detail.
    p = pathlib.Path(a)

    >>> str(p)
    c:\Program Files\Notepad++
    >>> repr(p)
    WindowsPath('C:/Program Files/Notepad++')
  • For other ways of constructing paths, see:
    https://docs.python.org/3/library/pathlib.html#pure-paths

Path values

  • Contains a wide variety of things for interrogating paths functions might return:
    p.name     # final path component.
    p.stem     # final path component without suffix.
    p.suffix     # suffix.
    p.as_posix()     # string representation with forward slashes (/):
    p.resolve()     # resolves symbolic links and ".."
    p.as_uri()     # path as a file URI: file://a/b/c.txt
    p.parts     # a tuple of path components.
    p.drive     # Windows drive from path.
    p.root     # root of directory structure.
    pathlib.Path.cwd()     # current working directory.
    pathlib.Path.home()    # User home directory
    p.anchor     # drive + root.
    p.parents     # immutable sequence of parent directories:

            p = PureWindowsPath('c:/a/b/c.txt')
            p.parents[0] # PureWindowsPath('c:/a/b')
            p.parents[1] # PureWindowsPath('c:/a')
            p.parents[2] # PureWindowsPath('c:/')

Path properties

  • Functions for checking the properties of actual files:
    p.is_absolute()     # Checks whether the path is not relative.
    p.exists()     # Does a file or directory exist.
    os.path.abspath(path)     # Absolute version of a relative path.
    os.path.commonpath(paths)     # Longest common sub-path.
    p.stat()     # Info about path (.st_size; .st_mtime)
    https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat
    p.is_dir()     # True if directory.
    p.is_file()     # True if file.
    p.read()     # A variety of methods for reading files as an entire object, rather than parsing it.

    Listing subdirectories:
    import pathlib

    p = pathlib.Path('.')
    for x in p.iterdir():
        if x.is_dir():
            print(x)

Path manipulation

  • Functions for checking the properties of actual files:
    p.rename(target) Rename top file or directory to target.
    p.with_name(name) Returns new path with changed filename.
    p.with_suffix(suffix) Returns new path with the file extension changed.
    p.rmdir() Remove directory; must be empty.
    p.touch(mode=0o666, exist_ok=True) "Touch" file; i.e. make empty file.
    p.mkdir(mode=0o666, parents=False, exist_ok=False) Make directory.
    If parents=True any missing parent directories will be created.
    exist_ok controls error raising.
  • To set file permissions and ownership, see:
    https://docs.python.org/3/library/os.html#os.chmod
    https://docs.python.org/3/library/os.html#os.chown
  • The numbers in 0o666, the mode above, are left-to-right the owner, group, and public permissions, which are fixed by base 8 (octal) numbers (as shown by "0o"). Each is a sum of the following numbers:
    4 = read
    2 = write
    1 = execute
    So here all three are set to read and write permissions, but not execute. You'll see you can only derive a number from a unique set of combinations. This is the classic POSIX file permission system. The Windows one is more sophisticated, which means Python interacts with it poorly, largely only to set read permission.

Glob

  • Glob is a library for pattern hunting in files and directories:
    https://docs.python.org/3/library/glob.html
  • Also built into other libraries. For example, to build a list all the files in a pathlib.Path that have a specific file extension:
    a = list(path.glob('**/*.txt'))
    The "**" pattern makes it recursively check the path directory and all subdirectories. With large directory structures this can take a while.
  • Or:
    a = sorted(Path('.').glob('*.csv'))
    A sorted list of csv files in the current directory.

Some other I/O libraries