My da|ra Login

Detailed view

metadata language: English

Congressional Record for 104th-110th Congresses: Text and Phrase Counts

Resource Type
Dataset : administrative records data, aggregate data, text, program source code
  • Gentzkow, Matthew (University of Chicago, and National Bureau of Economic Research)
  • Shapiro, Jesse (University of Chicago, and National Bureau of Economic Research)
Other Title
  • Version 1 (Subtitle)
Publication Date
Funding Reference
  • National Science Foundation
Free Keywords
government; legislative bodies; political speeches; public officials; United States Congress
  • Abstract

    This qualitative data collection contains original and processed text from the United States Congressional Record for the 104th-110th Congresses. The Congressional Record includes text from both chambers, the United States House of Representatives and the United States Senate. For each Congress the archive includes the original tagged text files, parsed files that separate the text into individual speeches, speaker metadata that can be linked to the parsed files, and counts of two-word phrases (bigrams) by speaker, party, and date.
  • Abstract

    Please refer to the Original P.I. Documentation in the ICPSR User Guide.
  • Table of Contents


    • DS0: Study-Level Files
    • DS1: Original 1995
    • DS2: Original 1996
    • DS3: Original 1997
    • DS4: Original 1998
    • DS5: Original 1999
    • DS6: Original 2000
    • DS7: Original 2001
    • DS8: Original 2002
    • DS9: Original 2003
    • DS10: Original 2004
    • DS11: Original 2005
    • DS12: Original 2006
    • DS13: Original 2007
    • DS14: Original 2008
    • DS15: Speeches
    • DS16: Counts by Date
    • DS17: Counts by Party
    • DS18: Counts by Speaker
    • DS19: Metadata: Speaker
    • DS20: Metadata: Speech
Temporal Coverage
  • 1995 / 2008
    Time period: 1995--2008
  • 2007-06 / 2011-11
    Collection date: 2007-06--2011-11
Geographic Coverage
  • United States
Sampled Universe
Full-text of the published Congressional Record for both chambers of the 104th-110th Congresses of the United States. Smallest Geographic Unit: United States
The data are not a sample, as this collection is an aggregation of data on Congressional speech.
Collection Mode
  • This collection has not been processed by ICPSR and is being released in the original ASCII format for convenience of use; no value labels are present in the data.

    Please see the ICPSR User Guide for information about what each part of the data collection contains.

    Please note that the files for this data collection are extremely large. Users should exercise discretion when downloading files.

2015-12-01 This collection is being updated to comply with new ICPSR file-naming conventions. No other changes have been made to the collection.2015-10-23 This collection is being updated to include data for the 110th Congress, spanning the years 2007 and 2008.2013-07-08 The User Guide was updated. Funding insitution(s): National Science Foundation (SES-0617658 and SES-0922342).
This version of the study is no longer available on the web. If you need to acquire this version of the data, you have to contact ICPSR User Support (
Alternative Identifiers
  • 33501 (Type: ICPSR Study Number)
  • Is previous version of
    DOI: 10.3886/ICPSR33501.v2
  • Gentzkow, Matthew, Shapiro, Jesse M.. What drives media slant? Evidence from U.S. daily newspapers. Econometrica.78, (1), 35-71.2010.
    • ID: 10.3982/ECTA7195 (DOI)

Update Metadata: 2015-12-01 | Issue Number: 5 | Registration Date: 2015-07-01

Gentzkow, Matthew; Shapiro, Jesse (2012): Congressional Record for 104th-110th Congresses: Text and Phrase Counts. Version 1. Version: v1. ICPSR - Interuniversity Consortium for Political and Social Research. Dataset.