Debugging · Geography Programming Courses

We can test the Python script by running it as we normally would at the command line!

python HelloWorld

This gets rid of the massive complication of the shell script, and helpfully might also tell us that the Python interpreter is ok as well. Run the script like this now.

You'll see that there's something horribly wrong with the script. Open it up in Notepad++, TextEdit, or Leaf and look at it. It probably looks fine. Let's look at the message from the interpreter. Hopefully you see something like this:

File "HelloWorld.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xff' in file HelloWorld.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Now, this is a pretty horrible error, because if you open the file, it looks fine. You see this error quite a lot when people cut and paste code together, or use odd editors. It means that the format of the HelloWorld.py file isn't the text format Python likes. Now, the file is plain text – which means it can't contain images or text formating, or anything like that (unlike, for example, Word files, which aren't plain text). However, there are a suprisingly large number of plain text formats, and if they don't contain any unusual or non-US characters, they are quite hard to tell apart. Python likes a format called UTF-8. You can't easily see the difference between UTF-8 and other formats, as text editors present them all looking the same, but at the level of the one and zeros stored in the computer, there is a difference.

If Python complains about formatting, there are a couple of things that might be wrong. Firstly, the whole file might be in a format Python doesn't like. This is unusual: UTF-8 is a very generic looking format, which means Python can quite happily pretend that a number of other popular formats are UTF-8 (which is one reason they picked it). The alternative is that the whole file is some format compatible with UTF-8, but one or two characters in it aren't compatible. This is by far the most common issue, especially when people are cutting and pasting code. The traditional sequence is:

Someone writes a lecture or some code in Powerpoint or Word.
Powerpoint or Word helpfully replace some UTF-8 straight quotes with curly 'smart' quotes, or hyphens with en-dashes.
The code gets posted to the web, where web browsers happily display the non-UTF-8 characters.
The code gets cut-and-pasted from the web into a Python file.

This generally results in one or two characters that aren't Python-compatible. Worse is if someone uses a text editor that doesn't produce plain text. Word will produce complicated combinations of binary files and text in a format called XML. It you open it in a text editor, it just looks like scrambled rubbish as the text editor tries to interpret binary that isn't text as text.

Screenshot: Encoding issue

Screenshot showing a chunk of Hello World saved in Word.

If, on the other hand, you see this kind of thing:

{\rtf1\ansi\deff0\nouicompat{\fonttbl{\f0\fnil\fcharset0 Calibri;}} {\*\generator Riched20 10.0.15063}\viewkind4\uc1 \pard\sa200\sl276\slmult1\f0\fs22\lang9 print("Hello World")\par}

It means the coder has accidentally saved the text in the Microsoft Rich Text Format. Weirdly, this is usually Mac users, as TextEdit saves text as RTF by default (how to stop this).

So, which do we have here? Let's work it out and fix the issue.

Geography Programming Courses

Issue: Getting it working Key skill: problem decomposition

Issue: Getting it working
Key skill: problem decomposition