This practical will walk you through the basics of getting a graph into a file, visualising it, and running a statistical analysis.
The practical will introduce you to a file format for structuring graphs called GraphML, and a visualisation and analysis package called "Guess".
First, lets look at GraphML. GraphML is a simple but powerful file format for representing graphs/networks. GraphML is a type of XML, the eXtensible Markup Language. XML files like those written in GraphML are written in plain text files, which means that, while they tend to be large files, they are easy to edit.
If you aren't familiar with XML, it would be well worth your time to become so, as almost all data holding files are moving to XML formats. Ordnance Survey data, for example, is now held in XML files. We've written a brief tutorial on XML and Java, and specifically on GML, the Geographical Markup Language; you might like to work through the non-java "basics" pages.
GraphML is alot simpler than GML. You can find an excellent introduction on the GraphML Primer site. For now we're going to download the first sample on that site: simple.graphml. Download this now (right-click the link and "Save"), and open it in Notepad++. You'll see the the file is relatively self-explanatory. Everything is structured by "tags" in angle-brackets, either on their own: <X/>, or in pairs enclosing data: <X>stuffInTags</X>. Tags can be nested inside each other. If you are familiar with HTML, the language that webpages are written in, you'll see that XML is very like HTML, only a little more rigorous (if you don't know HTML and want to learn some, you can find a tutorial here).
The top chunk:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
just links the file to some XML definition files on the web. Then you have a declaration of the graph name and that it is undirected (flow can occur in either direction):
<graph id="G" edgedefault="undirected">
Then a series of node definitions:
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
and edges between them:
<edge source="n0" target="n2"/>
Finally, and remaining paired and nested tags are closed.
As you can see, the format, for XML, is relatively clear and easy to generate. You can alter it directly and add to it in Notepad++. However, the format is much more powerful than this simple example suggests. You can find out more in the GraphML Primer.
This is all very well and good, but we'd much rather see the graph as a set of lines, and have some ability to edit and navigate it directly. This is what we'll look at next in Part 2.