CompERBench: A collection of 21 complete benchmark tasks for entity matching.

Resource Type
  • Primpeli, Anna
  • Bizer, Christian
Other Title
  • CompERBench: Complementing Entity Matching Benchmark Tasks (Alternative Title)
Publication Date
Free Keywords
entity matching, benchmarking, reproducibility
  • Abstract

    Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large amount of entity matching tasks for benchmarking have been developed and made publicly available for evaluating, comparing, reproducing and showing the strengths of different matching methods. However, the lack of fixed development and test sets, correspondence sets including both matching and non-matching record pairs as well as baseline results, hinders reproducibility and comparability. In an effort to enhance the reproducibility and comparability of matching methods, we complement existing benchmark tasks for entity matching with fixed development and test sets. We provide 21 complete benchmark tasks for entity matching for public download. The selected tasks are highly diverse and include data sets of different sizes, amounts of attributes, density, attribute data types as well as number of sources from which the originate.

Data and File Information
  • Unit Type: Other
    Number of Units: 1
    • File Name:
      File Format: application/zip
      File Size: 132649259
      Data Fingerprint: 9cd7374a3284017865299a8371061bd5
      Method Fingerprint: MD5
You can download our datasets by navigating to:
  • References
    DOI: 10.1145/3340531.3412781
  • Primpeli, Anna und Bizer, Christian (2020), Profiling entity matching benchmark tasks

    • ID: (URL)

Update Metadata: 2020-11-24 | Issue Number: 1 | Registration Date: 2020-11-24