Datasets
-
A corpus of web crawl data composed of 5 billion web pages.
A corpus of web crawl data composed of 5 billion web pages. This data set is freely available on Amazon S3 at s3://aws-publicdatasets/common-crawl/crawl-002/ and formatted in...
