My da|ra Login

Detailed view

metadata language: English

Automated Linking of Historical Data

Resource Type
Dataset : census/enumeration data
  • Abramitzky, Ran
  • Boustan, Leah
  • Eriksson, Katherine
  • Feigenbaum, James
  • Perez, Santiago
Publication Date
Free Keywords
census data; historical; linking; record linkage
  • Abstract

    Currently, the repository provides codes for two such methods:
    1. The ABE fully automated approach: This approach is a fully automated method for linking historical datasets (e.g. complete-count Censuses) by first name, last name and age. The approach was first developed by Ferrie (1996) and adapted and scaled for the computer by Abramitzky, Boustan and Eriksson (2012, 2014, 2017). Because names are often misspelled or mistranscribed, our approach suggests testing robustness to alternative name matching (using raw names, NYSIIS standardization, and Jaro-Winkler distance). To reduce the chances of false positives, our approach suggests testing robustness by requiring names to be unique within a five year window and/or requiring the match on age to be exact.
    2. A fully automated probabilistic approach (EM): This approach (Abramitzky, Mill, and Perez 2019) suggests a fully automated probabilistic method for linking historical datasets. We combine distances in reported names and ages between each two potential records into a single score, roughly corresponding to the probability that both records belong to the same individual. We estimate these probabilities using the Expectation-Maximization (EM) algorithm, a standard technique in the statistical literature. We suggest a number of decision rules that use these estimated probabilities to determine which records to use in the analysis.
Temporal Coverage
  • 1850-01-01 / 1940-12-31
    Time Period: Tue Jan 01 00:00:00 EST 1850--Tue Dec 31 00:00:00 EST 1940
Geographic Coverage
  • United States
This study is freely available to the general public via web download.
  • Is version of
    DOI: 10.3886/E120703

Update Metadata: 2020-08-25 | Issue Number: 1 | Registration Date: 2020-08-25

Abramitzky, Ran; Boustan, Leah; Eriksson, Katherine; Feigenbaum, James; Perez, Santiago (2020): Automated Linking of Historical Data. Version: 1. ICPSR - Interuniversity Consortium for Political and Social Research. Dataset.