Detailed Statistics of November 2013 Extraction
URL: http://webdatacommons.org/structureddata/2013-11/stats/stats.html
This document provides statistics about the Web Data Commons RDFa, Microdata and Microformats data sets which have been extracted from the November 2013 release of the Common Crawl.
In summary, we found structured data within 585 million HTML pages out of the 2.24 billion pages contained in the crawl (26%). These pages originate from 1.7 million different pay-level-domains out of the 12.8 million pay-level-domains covered by the crawl (13%). Altogether, the extracted data sets consist of 17.2 billion RDF quads. Instructions on how to download the RDFa, Microdata, and Microformats data sets are given on the page how to get the data.
There are no views created for this resource yet.
Additional Information
Field | Value |
---|---|
Last updated | unknown |
Created | unknown |
Format | HTML |
License | License not specified |
Created | over 11 years ago |
format | HTML |
id | 47164cfd-e44c-435e-8f9c-fd51db49859e |
package id | 08dae683-6acb-4481-ac83-21c8e3a5e71f |
position | 1 |
resource type | file |
revision id | 7581eb08-8fb9-465d-a80f-16e648597e05 |
state | active |