My da|ra Login

Detailed view

metadata language: English

Hyperlink Graph of the World Wide Web of 2012 (aggregated by first level subdomains)

Version
1
Resource Type
Dataset
Creator
  • Meusel, Robert
  • Lehmberg, Oliver
  • Bizer, Christian
Publication Date
2013-11-12
Classification
  • UNSPECIFIED:
    • webgraph
    • web graph
    • hyperlink graph
    • web crawl
Free Keywords
webgraph; web graph; hyperlink graph
Description
  • Abstract

    Knowledge about the general graph structure of the hyperlink graph is important for designing ranking methods for search engines. To amend the ranking calculated by search engines for different websites, search engine optimization agencies focus on linkage structure for their clients. An extreme appearance of ranking manipulation manifests in spam networks, where pages and websites publishing dubious content try to increase their ratings by setting a massive number of links to other pages and retrieve backlinks. The WDC Hyperlink Graph on first level subdomain level has been extracted from the Common Crawl 2012 web corpus and covers 95 million first level subdomains, linked by almost 2 billion connections, which are derived from the hyperlinks of the pages contained by the first level subdomains.

Data and File Information
  • Unit Type: Other
    Number of Units: 1
    • File Name: sd1-arc.gz
      File Format: application/x-gzip
      File Size: 8696165938
      Data Fingerprint: 561ea0e040c756a9585c5959622c6667
      Method Fingerprint: MD5
    • File Name: sd1-index.gz
      File Format: application/x-gzip
      File Size: 793555680
      Data Fingerprint: caf458ade9a06d194310629752c0eeab
      Method Fingerprint: MD5
Availability
Download
Data is available as download of Index File and Arc files. Please see http://webdatacommons.org/hyperlinkgraph for more information.

Update Metadata: 2018-03-02 | Issue Number: 9 | Registration Date: 2017-03-31

Meusel, Robert; Lehmberg, Oliver; Bizer, Christian (2013): Hyperlink Graph of the World Wide Web of 2012 (aggregated by first level subdomains). Version: 1. Universitätsbibliothek Mannheim. Dataset. https://doi.org/10.7801/50