GEOG5870/1M: Web-based GIS A course on web-based mapping

General: Uploading data II

So, hopefully you are starting to get an inkling as to what is wrong with our scripts. To confirm your fears, have a look at this page describing the basic SQL Injection Attack against our script.

As you can see, we have a serious issue with our script: it is, as they say, asking for little "Bobby Tables" to come visiting. At 'best' the SQL Injection Attack can allow the arbitrary rewriting of our returned webpage as an attack vector against our clients, at worse it allows for open access to our system and any data and systems connected to it. This is very bad. It is also very common: SQL Injection Attacks are the number one most important attack on the web. So, what can we do about it?

The short answer is nothing, well nothing generic. As soon as we allow users to upload into a database we open ourselves up to an SQL Injection Attack (see, for example, this presentation by Bill Karwin). However, there are specific things we can do to reduce the risk given a specific SQL script. In our case, for example, we know we are just going to be opening a connection and INSERTing text and numbers, this means we have options that wouldn't be available if we had to allow other kinds of data. We'll go through two processes to protect our server: sanitization and query parameterization. While we're at it we'll also look at data validation (check the right data has been added) on the client, more for their sake than ours.


First up, sanitization. In theory, one recommendation is just to use a PHP Filter. These have various options, and in some cases, these are enough. For example, if we know that we are just going to get numeric data, we can use a FILTER_SANITIZE_NUMBER_INT filter to remove everything else:

$day = filter_var($_POST['day'], FILTER_SANITIZE_NUMBER_INT);

With text we could (and probably should) use a FILTER_SANITIZE_SPECIAL_CHARS filter. This will encode some characters (for example, replacing "<" with "&lt;" and ">" with "&gt;", thereby protecting us from HTML injection). It will also strip out characters with ASCII codes below 32. These are "special characters" (list at bottom of Dan Short's page) which include such nefarious options as the backspace character, which allows an injector to delete back over strings (including, for example, query strings) and replace them with something else:

$text = filter_var($_POST['textBody'], FILTER_SANITIZE_SPECIAL_CHARS);

However, these filters are only really aimed at SQL injection attacks that aim to drop HTML into a page. They still leave plenty of characters that might cause damage. For example, they leave parentheses. Most people leave these in, as potentially important text, but I'm inclined to filter them out if we can. Karwin ridicules this as the "Fire Everything" method, and there's some truth more generally in this being a poor decision   applying filter after filter can end up with scrambled encodings and, in some cases, can undo previous filters. However, for an INSERT query such characters represent a serious opportunity to break out of the query, and we can do without them (if you like you can replace them with something else). To do this, we can use a regular expression ("regex"). To avoid later filters being interfered with, we'll do this first. The way to do this in PHP is:

$pattern = "/[\(\)\[\]\{\}]/";
$user = preg_replace($pattern," - ",$user);
$pattern = "/[^A-Za-z0-9\s\.\:\-\+\!\@\,\'\"]/";
$user = preg_replace($pattern,"",$user);

This first replaces all brackets with hyphens, and then removes everything but the characters shown. For more info, see this crib sheet. For semi-numerical data we can be more aggressive. The only unfortunate thing about this is that it also removes all non-ASCII characters (which the FILTER_SANITIZE_SPECIAL_CHARS doesn't). An alternative to this, and by far the more preferred by those wanting to preserve most characters, is to encode anything that might represent an SQL command component as an escape literal (so "\" becomes "\\" and therefore unusable in a query) using the built-in Postgres PHP function pg_escape_literal:

$user = pg_escape_literal($user);

Prior to both these filterings, we can also "trim" our data to remove any white spaces from either side. This helps with things like usernames and passwords, where a user might have hit the space bar after typing them. In PHP you can filter all the POST data with this little array_filter/callback function from googlybash24, thus:

array_filter($_POST, 'trim_value');
function trim_value(&$value){
   $value = trim($value);
}

The function trims one array element, and the array_filter passes each element to the function.

At the risk of "Fire Everything" code, we can roll the lot into the following functions which give us some flexibility for varying both the deletion and filter patterns (note we can also add some code to truncate array elements that are too long, *just* incase):

array_filter($_POST, 'trim_value');

function trim_value(&$value){
   $value = trim($value);
   $pattern = "/[\(\)\[\]\{\}]/";
   $value = preg_replace($pattern," - ",$value);
}

$pattern = "/[^A-Za-z0-9\s\.\:\-\+\!\@\°\,\'\"]/"
$lat = sanitize('lat',FILTER_SANITIZE_SPECIAL_CHARS,$pattern)

function sanitize($str,$filter,$pattern) {
   $sanStr = preg_replace($pattern,"",$sanStr);
   $sanStr = filter_var($_POST[$str], $filter);
   if (strlen($sanStr) > 255) $sanStr = substr($sanStr,0,255);
   return $sanStr;
}

This should santitize our code fine. However, given that we've now put quite a lot of restrictions on our users, we might like to validate their inputs using JavaScript, warning them if they've used a character that's unlikely to survive. We can do this for complicated requirements with the same kind of regex as we did in the PHP for replacing characters, or, for simpler requirements we can use some built in JavaScript functions to test the data types entered. It is, however, important that we don't think we can sanitize on the client side – an attacker could potentially send us an HTTP request they had knocked up themselves without going through our webpage (albeit this is less likely if we make the request using a AJAX call, using jQuery or XMLHttpRequest, rather than a POST because of the Same Origin Policy).

We're not going to go into details of validating inputs on the client side now, but to get you started, here's a zip file containing our form, but now with validation added.

Anyhow, now our inputs are sanitized to heck we should be ok. That said, there is one last extra burst of paranoia which is recommended, and that's to shift from a standard query to a parameterized query which is much safer. We'll look at doing this next.


[ Next: Parameterized queries ]