Dark theme

XML and Python


In this practical, we'll explore some of the ways we can interact with XML through Python. This will include validation, transformation, and different types of reading.

For this practical, you'll need the code in the lecture slides on Python and XML (PPTX).


Let's start off with validating our XML against a schema. Here's two XML files, and their associated schema:

map1.xml: map1.dtd
map2.xml: map2.xsd

Can you use the lecture notes to build a program that will validate the XML against the two schema types? You'll need to:
from lxml import etree
lxml is provided with Anaconda.

Note that if you follow the lecture notes in reading in the whole file at once:
xml1 = open("map1.xml").read()
You'll find lxml objects to the encoding part of the prolog. The easiest way to sort this is to completely remove the prolog:
xml1 = xml1.replace('<?xml version="1.0" encoding="UTF-8"?>',"")

Once you've got that working, go onto the next part where we'll parse the DOM and look at it in a couple of ways.


  1. This page
  2. Parsing and adding <-- next
  3. Transforming