A corpus of web crawl data composed of 5 billion web pages.
Data and Resources
-
About the Common Crawl Corpusapplication/download
A 1-pager describing the corpus, its format, link to terms of use, what you...
Additional Info
Field | Value |
---|---|
Source | http://aws.amazon.com/datasets/41740 |
Author | Common Crawl |
Last Updated | October 10, 2013, 20:20 (UTC) |
Created | May 9, 2012, 23:13 (UTC) |