My da|ra Login

Detailed view

metadata language: English

Hyperlink Graph of the World Wide Web of 2012 (aggregated by pay-level-domain)

Version
1
Resource Type
Dataset
Creator
  • Meusel, Robert
  • Lehmberg, Oliver
  • Bizer, Christian
Publication Date
2013-11-12
Classification
  • UNSPECIFIED:
    • webgraph
    • web graph
    • hyperlink graph
    • web crawl
Free Keywords
web graph; webgraph; hyperlink graph
Description
  • Abstract

    Knowledge about the general graph structure of this graph is important for designing ranking methods for search engines. To amend the ranking calculated by search engines for different websites, search engine optimization agencies focus on linkage structure for their clients. An extreme appearance of ranking manipulation manifests in spam networks, where pages and websites publishing dubious content try to increase their ratings by setting a massive number of links to other pages and retrieve backlinks. The WDC Hyperlink Graph aggregated by pay-level-domain has been extracted from the Common Crawl 2012 web corpus and covers 43 million pay-level-domains, linked by 623 million connections which have been derived from hyperlinks between the pages contained in the pay-level-domains.

Data and File Information
  • Unit Type: Other
    Number of Units: 1
    • File Name: pld-index.gz
      File Format: application/x-gzip
      File Size: 311068910
      Data Fingerprint: dc5a00ad5eba2d52327f712e71e70d5f
      Method Fingerprint: MD5
    • File Name: pld-arc.gz
      File Format: application/x-gzip
      File Size: 2912232962
      Data Fingerprint: 01a4ca9b461c799303ea5abfc8107ea1
      Method Fingerprint: MD5
Availability
Download
Data is available as download. Index File and Arc files. Please see http://webdatacommons.org/hyperlinkgraph for more information.

Update Metadata: 2018-03-02 | Issue Number: 7 | Registration Date: 2017-03-31

Meusel, Robert; Lehmberg, Oliver; Bizer, Christian (2013): Hyperlink Graph of the World Wide Web of 2012 (aggregated by pay-level-domain). Version: 1. Universitätsbibliothek Mannheim. Dataset. https://doi.org/10.7801/48