Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: County-Level Detailed Arrest and Offense Data
- Kaplan, Jacob (University of Pennsylvania)
AbstractVersion 5 release notes:
- Changes release notes description, does not change data.
- I am retiring this dataset - please do not use it.
- The reason that I made this dataset is that I had seen a lot of recent articles using the NACJD version of the data and had several requests that I make a concatenated version myself. This data is heavily flawed as noted in the excellent Maltz & Targonski's (2002) paper (see PDF available to download here and important paragraph from that article below) and I was worried that people were using the data without considering these flaws. So the data available here had the warning below this section (originally at the top of these notes so it was the most prominent thing) and had the Maltz & Targonski PDF included in the zip file so people were aware of it.
- There are two reasons that I am retiring it.
- First, I see papers and other non-peer reviewed reports still published using this data without addressing the main flaws noted by Maltz and Targonski. I don't want to have my work contribute to research that I think is fundamentally flawed.
- Second, this data is actually more flawed that I originally understood. The imputation process to replace missing data is based off of a bad design, and Maltz and Targonski talk about this in detail so I won't discuss it too much. The additional problem is that the variable that determines whether an agency has missing data is fatally flawed. That variable is the "number_of_months_reported" variable which is actually just the last month reported. So if you only report in December it'll have 12 months reported instead of 1. So even a good imputation process will be based on such a flawed measure of missingness that it will be wrong. How big of an issue is this? At the moment I haven't looked into it in enough detail to be sure but it's enough of a problem that I no longer want to release this kind of data (within the UCR data there are variables that you can use to try to determine the actual number of months reported but that stopped being useful due to a change in the data in 2018 by the FBI. And even that measure is not always accurate for years before 2018.).
1960-01-01 / 2017-12-31Time Period: Fri Jan 01 00:00:00 EST 1960--Sun Dec 31 00:00:00 EST 2017 (1960-2017 for crime data, 1974-2016 for arrest data)
Counties in the United States
Is version of
Update Metadata: 2021-02-17 | Issue Number: 1 | Registration Date: 2021-02-17