Intro to Programming and Data Analytics with Python


We're now in a position to do some data analysis.

What we're going to do is work through the Python course provided by the Software Carpentry group. This is an international group that trains researchers in software development, introducing both programming and good programming practice.


They run all sorts of courses, but also their lessons are all online and open source if you want to do the training yourself. There is an equivalent Data Carpentry group that teaches basic stats.

READ THROUGH THE TIPS BELOW, and then go to their Python course and work through as much or little as you like.

It's at least three or four hours of material, so don't expect to finish it all in one sitting, though the latter sections are shorter than the first. You may want to open the course in a different browser window to iPython so you can see both. In Chrome/Firefox/IE, just grab the tab and drag it away from the browser.


Tips (read so you know what's here, then come back when needed)

  1. Note that the instructions are for Linux/UNIX operating systems, so should work ok on Macs. By and large they also work on Windows. Where they say "UNIX Shell", they basically mean the command prompt.

  2. THE PRACTICALS ASSUME YOUR COMMAND PROMPT IS OPEN IN THEIR "DATA" DIRECTORY THAT YOU DOWNLOAD. The best way to open a command prompt in a directory in Windows is to open the folder, and then use the right-click + SHIFT we saw earlier. On a Mac (and, indeed, Windows) you can also move directories from within iPython if you want to use your current Notebook. The following commands will help (remember to SHIFT + ENTER after you type each one):

    pwd -- this tells you your present working directory (i.e. which directory you are in).

    cd somename -- moves you to the directory called somename within the current directory.

    cd .. -- this moves you to the directory containing the directory you are in.

    cd ~ -- takes you to your home/user directory.

    So, for example, say pwd tells us we are in our home/user directory, and we want to get to the data directory in the practical, the following will help us:

    cd Desktop
    cd python-novice-inflammation
    cd data

    i.e. we'll now be in the data directory within the python-novice-inflammation directory, on the desktop. If we then want to move to the python-novice-inflammation directory, we could type:

    cd ..

  3. They also don't tell you that you need to open iPython and make a new notebook so you can type the commands when you get to the directory!

  4. To make a new directory: In Windows, right-click the Desktop and select New --> Folder; once you have the folder, click to select it, and then click again to rename it. On a Mac, click and choose File --> New Folder, change the name and press Enter.

  5. To unzip a file: In Windows, right click the zip file and select Extract here, 7Zip --> Extract here. On Macs, double click the file.

  6. When you come to making the heatmap:

    Heat map

    Make sure you read the box after the instructions (it is called "Some IPython magic"). If you don't follow the instructions in this box as well *first*, iPython may hang, and you'll have to restart it from the command line. MAKE SURE YOU SAVE BEFORE MAKING THE HEATMAP. To turn iPython off, see below.

  7. In the section using "glob", they assume you are in the directory up from the data directory, without explicitly saying so. The easiest solution to this is to remove the "data/" from the filepath used in the Python.

  8. If you get onto the section on "Command-Line Programs", this uses a slightly different command line from Windows. Here the difference between UNIX/Linux/Macs and Windows becomes more critical. The solution is to download and install GitHub Desktop (no admin rights needed). This will give you two desktop icons, one called "Git Shell". This is a version of the UNIX command line "shell" called "bash" (for those familiar with this, it is all of bash except the man pages). If you download this, you may also like to see the Software Carpentry courses on Git and data processing with bash -- but maybe something for another session!

  9. Ignore the "discussion" section -- this is for materials in development, and some of the libraries don't come with Anaconda.

  10. Remember to save your workbook as you go along (using the disk icon). Obviously as all this work is on the Desktop of the local machine, you may want to copy it somewhere else if you want to keep it after you are done.

  11. To turn iPython off, close the webpages and then the command line. To kill iPython off (for example if it hangs), close the webpages and at the command line, press CTRL + c together (you may have to do it twice). CTRL + c generally kills things off at the command line.


Lastly, it is worth saying something about how you use this stuff yourself, rather than working through someone else's cut and paste.

The key to learning programming is two fold. First, you need to learn the core language: its "syntax" (grammar), key words ("if"; "for"; etc.), and built in functions ("print", etc.). For Python, there is a good tutorial available on the Python website.

Secondly, you need to understand where to look up functions built into libraries. For this you need either documentation with examples, and/or what is called the "API" (the "Application Program Interface" documentation). The API docs allow you to see all the different bits of each library, and all the functions and variables in each bit and what they are used for.

Here's the API and help files for some of the key libraries:

Have a browse round these to finish off, and get an idea of the kinds of stuff you can do.


Other resources

The core Python comes with an IDE ("IDLE"), but a better IDE for writing Python is Spyder, which comes with Anaconda. You can find an intro here on YouTube.

Books people have recommended include Automate the Boring Stuff with Python; Python for Data Analysis (though note that it is Python 2, so needs some adapting where librabries have moved on; Python Machine Learning; and Web Scraping with Python.

You have to be relatively flexible about moving from 3 to 2 where need be. Remember you can use Anaconda to flip between them, and you can find a list of the differences here.


That's it -- you're done!

  1. Start
  2. Get the software
  3. Writing our first program
  4. Debugging
  5. iPython Notebook
  6. Data Analysis