WebDataCommons

More and more websites have started to embed structured data describing products, people, organizations, places, events into their HTML pages using markup standards such as RDFa, Microdata and Microformats. The Web Data Commons project extracts this data from several billion web pages. The project provides the extracted data for download and publishes statistics about the deployment of the different formats. Latest extraction was done in August 2012. The extraction contains over 7 billion triple which are available as download (http://webdatacommons.org/2012-08/stats/how_to_get_the_data.html). 1,416 files with a total size of 101 GB are provided.

Data and Resources

Additional Info

Field Value
Source http://webdatacommons.org/
Last Updated October 11, 2013, 00:08 (UTC)
Created April 14, 2013, 15:35 (UTC)
datasource common crawl