Other Containers

Dr Andy Evans

[Fullscreen]

Strings

  • As well as tuples and ranges, there are two additional important immutable sequences: Bytes (immutable sequences of 8 ones and zeros (usually represented as ints between 0 and 255 inclusive, as 11111111 is 255 as an int); of which Byte Arrays are the mutable version) Strings (text)
  • Many languages have a primitive type which is an individual character. Python doesn't - str (the string type) are just sequences of one-character long other str.

Strings

  • It may seem odd that strings are immutable, but this helps with memory management. If you change a str the old one is destroyed and a new one created.
    >>> a = "hello world"
    >>> a = "hello globe"
    # New string (and label).
    >>> a = str(2) # String "2" as text.
    >>> a[0] # Subscription.
    'h'
    >>> a[0] = "m" # Attempted assignment.
    TypeError: 'str' object does not support item assignment

String Literals

  • String literals are formed 'content' or "content" (inline) or '''content''' or """content""" (multiline).
  • In multiline quotes, line ends are preserved unless the line ends "\" :
    print('''This is \
    all one line.
    This is a second.''')
  • For inline quotes, you need to end the quote and start again on next line (with or without "+" for variables):
    print("This is all " +
    "one line.")
    print("This is a second")
    # Note the two print statements.

Concatenation

  • Strings can be concatenated (joined) though:
    >>> a = "hello" + "world"
    >>> a = "hello" "world"
    # "+" optional if just string literals.
    >>> a
    'helloworld' # Note no spaces.
  • To add spaces, do them inside the strings or between them:
    >>> a = "hello " + "world"
    >>> a = "hello" + " " + "world"
    For string variables, need "+"
    >>> h= "hello"
    >>> a = h + "world"

Immutable concatenation

  • But, remember that each time you change an immutable type, you make a new one. This is hugely inefficient, so continually adding to a immutables takes a long time.
  • There are alternatives:
    • With tuples, use a list instead, and extend this (a new list isn't created each time).
    • With bytes, use a bytearray mutable.
    • With a string, build a list of strings and then use the str.join() function built into all strings once complete.
      >>> a = ["x","y","z"]
      >>> b = " ".join(a)
      >>> b
      'x y z'
      >>> c = " and ".join(a)
      >>> c
      'x and y and z'

Parsing

  • Often we'll need to split strings up based on some delimiter.
  • This is known as parsing.
  • For example, it is usual to read data files a line at a time and them parse them into numbers.

Split

  • Strings can be split by:
    a = str.split(string, delimiter)
    a = some_string.split(delimiter)

    (There's no great difference)
  • For example:
    a = "Daisy, Daisy/Give me your answer, do."
    b = str.split(a," ")
  • As it happens, whitespace is the default.
  • Search and replace is a common string operation
    str_var.startswith(strA, 0, len(string))
    # Checks whether a string starts with strA.
    # Other params optional start and end search locations.
    str_var.endswith(suffix, 0, len(string))
    str_var.find(strA, 0 len(string))
    # Gives index position or -1 if not found
    str_var.index(strA, 0, len(string))
    # Raises error message if not found. rfind and rindex do the same from right-to-left
  • Once an index is found, you can uses slices to extract substrings.
    strB = strA[index1:index2]
  • There are various functions to replace substrings:
    lstrip(str)/rstrip(str)/strip([chars])
    # Removes leading whitespace from left/right/both
    str_var.replace(substringA, substringB, int)
    # Replace all occurrences of A with B. The optional final int arg will control the
    # max number of replacements.

Escape characters

  • What if we want quotes in our strings?
  • Use double inside single, or vice versa:
    a = "It's called 'Daisy'."
    a = 'You invented "Space Paranoids"?'
  • If you need to mix them, though, you have problems as Python can't tell where the string ends:
    a = 'It's called "Daisy".'
  • Instead, you have to use an escape character, a special character that is interpreted differently from how it looks. All escape characters start with a backslash, for a single quote it is simply:
    a = 'It\'s called "Daisy".'
  • Escape characters
    \newline    Backslash and newline ignored
    \\    Backslash (\)
    \'    Single quote (')
    \"    Double quote (")
    \b    ASCII Backspace (BS)
    \f    ASCII Formfeed (FF)
    \n    ASCII Linefeed (LF)
    \r    ASCII Carriage Return (CR)
    \t    ASCII Horizontal Tab (TAB)
    \ooo    Character with octal value ooo
    \xhh    Character with hex value hh
    \N{name}    Character named name in the Unicode database
    \uxxxx    Character with 16-bit hex value xxxx
    \Uxxxxxxxx    Character with 32-bit hex value xxxxxxxx

String Literals

  • Going back to our two line example:
    print("This is all " +
    "one line.")
    print("This is a second")
    # Note the two print statements.
  • Note that we can now rewrite this as:
    print("This is all " +
    "one line. \n" +
    "This is a second")
  • There are some cases where we want to display the escape characters as characters rather than escaped characters when we print or otherwise use the text. To do this, prefix the literal with "r":
    >>> a = r"This contains a \\ backslash escape"
  • From then on, the backslashes as interpreted as two backslashes. Note that if we then print this, we get:
    >>> a
    'This contains a \\\\ backslash escape'

    Note that the escape is escaped.
  • String literal markups:
    R or r is a "raw" string, escaping escapes to preserve their appearance. F or f is a formatted string (we'll come to these). U or u is Python 2 legacy similar to R. Starting br or rb or any variation capitalised - a sequence of bytes.

Formatting strings

  • There are a wide variety of ways of formatting strings.
    print( "{0} has: {1:10.2f} pounds".format(a,b) )
    print('%(a)s has: %(b)10.2f pounds'%{'a':'Bob','b':2.23333})
  • See website for examples.

Sets

  • Unordered collections of unique objects.
  • Main type is mutable, but there is a FrozenSet: https://docs.python.org/3/library/stdtypes.html#frozenset

    a = {"red", "green", "blue"}
    a = set(some_other_container)
  • Can have mixed types and container other containers.
  • Note you can't use a = {} to make an empty set (as this is an empty dictionary), have to use:
    a = set()

Add/Remove

  • Useful functions:
    a.add("black")
    a.remove("blue")
    # Creates a warning if item doesn't exist.
    a.discard("pink")
    # Silent if item doesn't exist.
    a.clear()
    # Discard everything.

Operators

  • Standard set maths:
    | or a.union(b) Union of sets a and b.
    & or a.intersection(b) Intersection.
    - or a.difference(b) Difference (elements of a not in b).
    ^ or a.symmetric_difference(b) Inverse of intersection.
    x in a Checks if item x in set a.
    x not in a Checks if item x is not in set a.
    a <= b or a.issubset(b) If a is contained in b.
    a < b # a is a proper subset (i.e. not equal to)
    a >= b or a.issuperset(b) If b is contained in a.
    a > b a is a proper superset
  • Operators only work on sets; functions work on (some) other containers.

Other functions

Mappings

  • Mappings link (map) one set of data to another, so requests for the first get the second.
  • The main mapping class is dict (dictionary; in other languages these are sometimes called associative arrays, or ~hashtables)
  • They're composed of a table of keys and values. If you ask for the key you get the value.
  • An example would be people's names and their addresses.
  • Keys have to be unique.
  • Keys have to be immutable objects (we don't want them changing after they're used).
  • Dictionaries are not ordered.

Dict

  • To make a dict:
    a = {1:"Person One", 2:"Person Two", 3:"Person 3"}
  • If strings you can also do:
    a = {"one"="Person One", "two"="Person Two"}
    a = {} # Empty dictionary.

    keys = (1,2,3)
    values = ("Person One", "Person Two", "Person 3")
    a = dict(zip(keys, values))

    a[key] = value
    # Set a new key and value.
    print(a[key])
    # Gets a value given a key.
  • Useful functions
    del a[key]
    Remove a key and value.
    a.clear()
    Clear all keys and values.
    get(a[key], default)
    Get the value, or if not there, returns default.
    (normally access would give an error)
    a.keys() a.values() a.items()
    Return a "view" of keys, values, or pairs.
    These are essentially a complicated insight into
    the dictionary. To use these, turn them into a list:
    list(a.items()) list(a.keys())
    Again, there are update methods. See:
    https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

Dictionaries

  • Dictionaries are hugely important as, not that you'd know it, objects are stored as dictionaries of attributes and methods.