Re-introducing the Cambridge Group Family Reconstitutions

  • Alter, George (University of Michigan)
historical demography; family reconstitution; family history; demography; English history
  • Abstract

    English Population History from Family Reconstitution 1580-1837 (1997) was important both for its scope and its methodology. The volume was based on data from 26 family reconstitution studies carefully selected to represent 250 years of English demographic history (Wrigley, Davies, Oeppen, & Schofield, 2018). These data remain relevant for new research questions, such as studying the intergenerational inheritance of fertility and mortality. To expand their availability the family reconstitutions have been translated into new formats: a relational database, the Intermediate Data Structure (IDS) and an episode file for fertility analysis. This paper describes that process and examines the impact of methodological decisions on analysis of the data. Wrigley, Davies, Oeppen, and Schofield were sensitive to changes in the quality of the parish registers and cautiously applied the principles of family reconstitution developed by Louis Henry. We examine how these choices affect the measurement of fertility and biases that are introduced when important principles are ignored.

  • 1580-01-01 / 1837-12-31
    Time Period: Fri Jan 01 00:00:00 EST 1580--Sun Dec 31 00:00:00 EST 1837
  • England
  • Program code for "Re-introducing the Cambridge Group Family Reconstitutions"
       George Alter, University of Michigan
       August 4, 2020

    1. Creating the "Chronicle" file
    The Chronicle file was created using Microsoft Access file CAMPOP_fert_groups_to_Chronicle.accdb

    a. Due to the size of the CamPOP reconstitution files in IDS, chronicle and episode files were created for 5 sub-samples of parishes and ultimately concatenated in Stata.  Queries  00A to 00E read IDS tables from the full set of parishes and copy sub-samples  of parishes to work on.  Tables in the full dataset are connected to this database by linking to an external Access database, and they are renamed with "_all" (e.g. CONTEXT_all).  The sub-samples use the original IDS table names (e.g. CONTEXT, INDIVIDUAL, etc.). 

    b. Macro "run chronicle" invokes all the queries used to create the Chronicle file.  "run chronicle" begins by calling two other macros "Run bd_dates" and "run event first last".

    c. "Run bd_dates" creates the "bd_dates" table, which organizes dates of birth and death for each individual.  Dates reported as "birth_date" are preferred to dates of "baptism_date", and dates of "death_date" are preferred to "funeral_date".  

    When dates are incomplete, 15 is imputed for the day and 6 for the month.  Estimated dates are recorded in the "best" (birth estimation)and "dest" (death estimation) columns of table "bd_dates". The "bd_dates" table also includes "sex".

    d.  "run event first last" identifies the first and last event in each family history. Events are stored in table "all_fam_events".  When deaths of the husband and/or wife are observed, the last event is the earliest death or the wife's 50th birthday.  If death dates of spouses are no available, the last event involving a child is assigned.  If there are no events to children, the date of marriage is usually the last event. 

    e. "run chronicle" adds rows to the Chronicle file for each event and variable used in fertility analysis.
    -- Marital "Unions" are identified with an ID_C found in the INDIV_CONTEXT table
    -- If a birth date is not available for any child in the family, all events for the family are discarded.
    -- For each child, we identify the dates of birth and death of the previous child. 
    -- Twins are recorded as separate records in the Chronicle file.
    -- An "end of lactation" event is created 365+280 days after each birth.  If the next birth is less than 365+280 days, the "end of lactation" event ends at the birth.  The death of the preceding child (or death of last surviving twin) ends lactation with a lag of 280 days.
    -- Family histories are removed from the Chronicle file if there is no date for the start of the union or only one date in the history.
    -- Values for "DayFrac" (aka "offset") are added to rows in Chronicle by Type or by Value within some Types.  "DayFrac" is a value between 0 and .99 used to sequence events that occur on the same day. 
    -- Tables are produced showing duplicate events.  The Stata program used to create episodes does not allow an event type to occur twice on the same day.

    2. Chronicle tables for subsamples in CAMPOP_fert_groups_to_Chronicle.accdb were converted to a Stata .dta file using the StatTransfer program. 

    3. chron_fix&
    This Stata script converts Chronicle files to Episode files using a modified version of Quaranta's script (see below).  chron_fix& runs the episode create 5 times on each of the subsamples exported from the CAMPOP_fert_groups_to_Chronicle.accdb database.  It also changes values of DayFrac equal to .5 to .4, which resolved date collisions caused by twins. 

    This is a modified version of Quaranta's that converts a chronicle file into an episodes file.
    This script reads variable descriptions from the VarSetup.dta file. was modified from to make debugging the data easier.  The EpisodesFileCreator is sensitive to duplicate events and other problems in the chronicle file.  In several places temporary variables or files are saved to make it easier to find problems in the chronicle file.

    This .do file creates most of the tables and graphs used in the paper.  Each set of rules for selecting fertility histories is in a .do file, such as for the "Reference sample". 

    The selection do files call to read the data and to split the data into 25-year periods and 5-year age groups. Then, fertility rates are computed by age and period and saved in .dta files. combines .dta files of fertility rates into a single dataset which is used to graph fertility rates by age and period.  The Stata strate command is used to compute rates by age and period.  Each .do file creates a sampleType variable indicating which selection criteria it uses.

    6. Fertility history selection scripts

     1  -- "Reference sample"
     2  -- Fertility history ends in a birth
     3  -- Mother's date of birth estimated
     4  -- Date of marriage estimated
     5  -- Date of death used to end history is estimated
     6  -- Fertility history ends with spouse death before wife reaches age 50
     7  -- Death dates for both husband and wife available
     8  -- Includes data from periods when recording of events is considered incomplete
     9  -- Family histories in which one spouse had been previously married
    10  -- Includes data from years not included by CamPOP selection criteria
    11  -- Family histories in which both spouses survive until the wife reaches age 50
    12  -- One or more birth dates of children are estimated
    13  -- One or more birth dates of children is only given as year without day and month
    14  -- Second marriage for wife
    15  -- Second marriage for husband
    16  -- Occupation of husband is known

    7. reads the episode files and combines them into a single dataset.  Variable and value labels are added.  Dates are converted from to Stata's internal date format.

    8. uses Stata survival time (st) functions to split the data into time periods and age groups.

    9.  and occ_parish_yr25.R  and occ_parish_yr25.R create Figure 12. Proportion of Husbands with Occupations by Parish and Time Period is a stata script that aggregates data on husband's occupation into 25-year periods.
    occ_parish_yr25.R is an R script that creates the "ridge" plat used in Figure 12.


  • Alter, George, Gill Newton, and Jim Oeppen. “Re-Introducing the Cambridge Group Family Reconstitutions.” Historical Life Course Studies, n.d.

