Detailed Statistics of November 2013 Extraction

URL: http://webdatacommons.org/structureddata/2013-11/stats/stats.html

This document provides statistics about the Web Data Commons RDFa, Microdata and Microformats data sets which have been extracted from the November 2013 release of the Common Crawl.

In summary, we found structured data within 585 million HTML pages out of the 2.24 billion pages contained in the crawl (26%). These pages originate from 1.7 million different pay-level-domains out of the 12.8 million pay-level-domains covered by the crawl (13%). Altogether, the extracted data sets consist of 17.2 billion RDF quads. Instructions on how to download the RDFa, Microdata, and Microformats data sets are given on the page how to get the data.

There are no views created for this resource yet.

Additional Information

Field Value
Last updated unknown
Created unknown
Format HTML
License License not specified
Createdover 11 years ago
formatHTML
id47164cfd-e44c-435e-8f9c-fd51db49859e
package id08dae683-6acb-4481-ac83-21c8e3a5e71f
position1
resource typefile
revision id7581eb08-8fb9-465d-a80f-16e648597e05
stateactive