Search site

School of Geography

Sarah Gadd Sarah Gadd

Contact details

School of Geography
University of Leeds
University Road
Leeds LS2 9JT   UK



Project title

Development and application of statistical methods for addressing the heterogeneity of data collection intervals common in longitudinal datasets

Project Overview

There is great interest in many sciences in gathering data over time and analysing patterns of change. Examples include the study of growth curves, psychological changes, and changes in the prevalence of diseases in an area. Often, researchers examine the relationship of these patterns of data (longitudinal exposures) to later events (outcomes), which requires the use of data analysis techniques that describe patterns of the longitudinal exposure in individuals (e.g.  growth curves in individual people). There are several techniques that can be used to do this. Multilevel models (MLMs) start by describing the average pattern of the longitudinal exposure, but also give information on how much individual patterns differ from it; the results from these models are easy to understand but they cannot describe very complicated patterns. On the other hand, latent growth curve models (LGCMs) estimate individual patterns of a longitudinal exposure by creating extra data that describes them, rather than an average trajectory. To do this, latent growth curve models represent time in an unusual way – by adding a ‘factor loading’ relating the data describing the curves to each measurement of the longitudinal exposure. This factor loading is often set to the time at which the measurement was taken, but can be estimated by the model, which allows for very complex curves to be represented by LGCMs much more easily than in MLMs. LGCMs can also be extended to growth mixture models (GMMs), which identify underlying subgroups in the data based on the types of patterns of the longitudinal exposure. However, LGCMs require the data in all individuals to be measured at exactly the same time points – called ‘interval homogeneity’. However, this is rarely the case in practice, especially when using observational data (e.g. children’s growth curves recorded as measurements in their medical records); thus, these most flexible modelling techniques cannot be used widely. Another method that can be used to describe longitudinal exposures is functional data analysis (FDA). This describes individual patterns of the longitudinal exposure by fitting smooth curves in smaller time segments. These segments are bounded by ‘knots’, the number and position of which are chosen by the researcher. This is also a flexible method, but can be inaccurate when there exist wide spaces between measurements of the longitudinal exposure.

Aims and objectives

This project aims to examine the utility of carrying out FDA on a longitudinal exposure without interval homogeneity by using the individual patterns this describes to interpolate individual measurements and create interval homogeneity, thereby allowing for the use of latent growth curve modelling to analyse the patterns of the longitudinal exposure while relating this to a later outcome. These aims will be addressed using real and simulated data, and the following questions will also be addressed: a) How can the optimum points for interpolation of measurements be found; and b) How should the optimum ‘basis function’ be chosen (i.e. the types of curves used to fit segments of the longitudinal exposure in FDA)? The results will also be compared to those from LGCMs (which assume interval homogeneity) and from MLMs.


Professor Alison Heppenstall, Professor Mark Gilthorpe , Dr Peter Tennant

Cluster & research affiliations

Centre for Spatial Analysis and Policy, Leeds Institute for Data Analysis


White Rose DTP ESRC Advanced Quantitative Methods studentship

Brief CV

2016-17: Research assistant in Statistical Epidemiology, London School of Hygiene and Tropical Medicine

2016: MSc Epidemiology and Biostatistics, University of Leeds

2015: MRes Medical Sciences, Newcastle University


  • Gadd, S; Arnold, K; Ellison, G; Textor, J & Gilthorpe, M 2016. OP89 Quantifying bias due to regression to the mean in lifecourse analysis. Journal of Epidemiology and Community Health, 70, A49.
  •  Arnold, K; Gadd, S; Ellison, G; Textor, J & Gilthorpe, M 2016. P19 Incorporating time-invariant confounders into residual increase models. Journal of Epidemiology and Community Health, 70, A61-A62.