Measuring Confidentiality Risks in Aggregate Census

School of Geography, University of Leeds


Principal investigators

Stan Openshaw
Phil Rees

Other researchers

Oliver Duke-Williams
Andrew Evans

Dates

1st August 1997 - 31 January 1998

Grant

ESRC Award H507255144

Summary

Because census data must be kept confidential, it is only released as bulk data for various sized areas.

The size of the areas for which data is released is broadly determined by the risk of recognising single people within the bulk statistics. For example, if you know there's only one person in an area that has a long term illness, it might also be possible to find out how many hours they work if we have information grouped by those two catagories. This is a particular problem now high powered computers are readily available and can be used to search vast quantities of data. Unfortunately, the size of areas released at present is subjectively decided, usually erring on the side of caution (large areas) and data that appears at risk is often altered, rendering it less useful.

The project centres around the testing of a new, quantitative statistical measure of how small areas can get before they pose a confidentiality problem, so that more accurate information can be released to the public without endangering their privacy.

Results

Oliver has produced a postscript paper giving more details. You could also check out the the Statistical Disclosure Control webpage, or the following presentations:


[School of Geography homepage] [Leeds University homepage]