Dark theme

Key ideas in depth: Binary data


Hopefully you'll never have to deal with binary data; it's pretty horrible. The one advantage with it is that it is singificantly more efficient that most other data, as you're right at the lowest level of computing. Just incase you do find yourself facing binary reading and manipulation, here's some starting info on it.

The simplest binary data is just numbers stored such that each number takes up a set amount of ones and zeros. For example, we might use four bytes (each eight "bits" or individual ones or zeros) to represent integers. Even this can be reasonably complicated. We have to decide, for example, what format the data is held in. When storing numbers as bits, there are two broad formats: Bigendian:
00000000
00000000
00000000
000000001 = 1
and Littleendian:
00000001
00000000
00000000
000000000 = 1

There are also different ways of representing negative numbers; for example, the popular "two's complement". Where data is to be transmitted, it may also include partity bits for transmission error checking.

Numerical binary data formats can be extremely complicated. In Python this isn't helped by the fact that Python doesn't have its own internal standard for the storing of numeric data, but uses a combination of whatever the local operating system prefers and an extendible space format (for example, there's no limit on the size of ints).

However, there is, essentially, nothing to stop any given software having a propriety binary storage system, and determining how any given file format works without appropriate documentation can be a serious undertaking, even before working in consideration of legal copyright etc.


In terms of storing and manipulating data, Python has its own data structures for holding binary data: bytes and bytes, but also works with C data structures (as much of the standard Python implementation is written in C): structs.

If you are dealing with binary data representing numbers, it is relatively easy to cast binary data to numbers, for example using:

a = int('00000001')

and it is, actually, fairly usual to deal with bits as ints even if that's not what the values represent, so you see a fair amount of transferring between the two in binary manipulation whether the actual data is mean to be numbers or not.

If you need to manipulate the bits as bits, there are a set of binary operators to do this:

>> << Bitshifts, e.g. 1 >> 1 == 0 (00000001 >> 1 == 00000000)
1 << 1 == 2 (00000001 << 1 == 00000010)
 
& ^ | Bitwise AND, XOR, OR 00000001 & 00000011 == 00000001
00000001 | 00000010 == 00000011
00000001 ^ 00000011 == 00000010
 
~ Bitwise inversion ~00000001 == 11111110

Overall, binary data is a significant pain. Good starting points for learning about binary data and Python include:

DevDungeon on working with binary in Python.

WikiPython on bit manipulation.

Ashish R Vidyarthi on binary and Python, parts one; two; and three.

The open function documentations and the binary data services documentation.