3 datasets found

Filter Results
  • Web Tables

    This page provides a large corpus of HTML tables for public download. The corpus has been extracted from the 2012 version of the Common Crawl and contains 147 million relational...
  • Hyperlink Graph

    The latest graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our...
  • RDFa, Microdata, and Microformat Data Set

    More and more websites have started to embed structured data describing products, people, organizations, places, events into their HTML pages using markup standards such as...