MoSeS UK Population Bases for Socio-economic Modelling

An image of Andy Turner

Andy Turner


MoSeS is the modelling and simulation node of the National Centre for e-Social Science. MoSeS aims to develop a suite of modelling and simulation tools that are Grid Enabled in that:

  1. They are configured and available for use on the UK NGS.
  2. Users of the tools that are registered users of the source data can have access to the data outputs and simulations produced by other users.
  3. The results will be more easily replicable due to the capture of provenence data that stores information on what versions of data are used, what computing resources are used and what psuedo random numbers are used in data processing.

It is hoped that the tools will be used by social scientists and practitioners for a wide range of applications.

MoSeS is producing a tool that generates individual level population data sets for the UK based on the 2001 UK Population Census (UKPC) at Output Area (OA) level. Initial data sets of 58789293 records have been created for the year 2001 based on a number of constraints and optimisations. A first set of data sets are purely based on the UKPC, so only UKPC variables are attributed to synthetic individuals. The plan is to allow for linkage with other individual level demographic data sets (e.g. lifestyle, health, crime etc...) to enrich the synthetic population data with other variables e.g. estimated financial income and expenditures, estimated disease likelihoods, and estimated social welfare needs.

The initial data sets are fed into simulation models that change the values of each synthetic individuals variables over time. This, in an aggregated fashion, changes; their households, their neighbourhoods and larger regions. In developing the simulation models, the initial focus was on ageing individuals, and in the process of dissolving and reforming households under the processes of birth, death and migration - with a focus on family type households. A Toy Model is being developed for Leeds Local Authority. This Toy Model's applied focus relates to the health related aspect of ageing and the elderly concerned with care provision. Its applications related to age and ageing are being developed within the constraints of current funding. The generic nature of the population data and simulation modelling tools lend themselves to a wide range of further applications that may be focused on results at local, regional or national levels, i.e. spatial levels of policy, planning and operations in UK government.

The full exposition based on this abstract is to focus on the process of generating the inital population bases that are inputs to simulation models. The inital data sets are comprised of individual records selected from the 3% Individual Sample of Annonymised Records (ISARs). The 1843525 ISARs have both individual and household attributes and a unique identifier (ID). The spatial reference of individuals is given as one of the four countries, and of their household as one of thirteen regions. This spatial reference has been ignored. The assumption is that for any individual represented in the ISARs there is at least one such person in each of the regions. This is probably only true for non-region specific variables. For instance, whether someone can understand Welsh is considered irrelevant although in reality it is not. In essence, the task is to select (with replacement) sets of ISARs to form the 223060 OA populations for the UK. The product are OA sets of ISAR record IDs (possibly containing duplicates) that can be grouped into households and communal establishments for the OA. Assigning individuals into households and communal establishments is optional to the population data creation process, but it is usually something which is of importance. In such cases, measures of how well individuals can be assigned into households are paramount.

OA Census Area Statistics (CAS) can be used to constrain and optimise the initial population data creation process at a number of levels. Contraints are those totals that must conform, whereas measures used in optimisation are fitted based on some function. A detailed investigation of what OA level contraints are useful and can be imposed is detailed and some analysis of these is reported. A number of optimisations within these constraints are then considered as bases that can be used for further optimisation.

Version 0.4.0 of this page published on 2008-01-30.