Correcting for measurement error in categorical, longitudinal data using hidden Markov models

Resource Type
Dataset : administrative records data, survey data
  • Pankowska, Paulina (Vrije Universiteit Amsterdam)
Publication Date
Funding Reference
  • Statistics Netherlands (CBS)
Free Keywords
measurement error; data linkage; hidden Markov models; latent class modeling
  • Abstract

    This project focuses on the problem of measurement error and investigates the feasibility of using hidden Markov models (HMMs) to correct for such error in categorical, longitudinal data. In doing so, we have first illustrate how measurement error poses a substantial threat to the validity and accuracy of estimates. We then demonstrate the need to use multiple-indicator HMM specifications, which can account for the nonignorable presence of systematic/dependent errors. Finally, we show that the use of such extended models is feasible. That is, even though such HMMs require record linkage, linkage error is largely not a problem. Furthermore, while their implementation process is complex and time-consuming, it can be simplified because error parameters can be re-used for a number of years.
  • Weighting

    In the LFS the weighting of the observations is twofold. First, inclusion weights are assigned to the observations. These weights correct for biased inclusion probabilities that are caused by the sampling method. Second, the final weights are constructed (by adjusting for sex, age, country of origin, official place of residence and some other regional classifications). These weights are used to reduce non-response bias.

    While in our analyses the inclusion of weights did not significantly affect the results and therefore we decided to exclude them, this might not be the case in other applications, in particular when the weights vary substantially across respondents.
  • Methods

    Response Rates: In general, according to colleagues from Statistics Netherlands, the response rate in the LFS was around 61% in 2009 and 53% in 2010. However, as in our analysis we used a sub-sample of the LFS data and selected (i) only individuals between 25 and 55 years of age and (ii) only those who could be linked to the ER data, we do not know the response rates for our sample. Statistics Netherlands has also indicated that the LFS is subject to relatively high panel attrition, which also leads to selectivity, but the exact rates are unknown

    While the ER officially cannot be subject to drop-out as submission of reports is obligatory for all employers, 2,619 observations (out of a total of 133,290) are missing.
Temporal Coverage
  • 2007-01-01 / 2010-12-31
    Time Period: Mon Jan 01 00:00:00 EST 2007--Fri Dec 31 00:00:00 EST 2010
Geographic Coverage
  • The Netherlands
Sampled Universe
Individuals on the Dutch labour market aged 25 to 55. .
The LFS is a sample survey. 

The ER covers all individuals who are employed in the Netherlands.

This study is freely available to the general public via web download.
  • Is version of
    DOI: 10.3886/E120363
  • Pankowska, Paulina, Bart Bakker, Daniel L. Oberski, and Dimitris Pavlopoulos. “Reconciliation of Inconsistent Data Sources by Correction for Measurement Error: The Feasibility of Parameter Re-Use.” Statistical Journal of the IAOS 34, no. 3 (August 9, 2018): 317–29.
    • ID: 10.3233/SJI-170368 (DOI)
  • Pankowska, Paulina, Bart F M Bakker, Daniel L Oberski, and Dimitris Pavlopoulos. “How Linkage Error Affects Hidden Markov Model Estimates: A Sensitivity Analysis.” Journal of Survey Statistics and Methodology 8, no. 3 (June 1, 2020): 483–512.
    • ID: 10.1093/jssam/smz011 (DOI)

Update Metadata: 2020-07-29 | Issue Number: 1 | Registration Date: 2020-07-29