A corpus of web crawl data composed of 5 billion web pages.
Data and Resources
-
About the Common Crawl Corpusapplication/download
A 1-pager describing the corpus, its format, link to terms of use, what you...
Additional Info
| Field | Value |
|---|---|
| Source | http://aws.amazon.com/datasets/41740 |
| Author | Common Crawl |
| Last Updated | October 10, 2013, 20:20 (UTC) |
| Created | May 9, 2012, 23:13 (UTC) |
