Skip to content
Log in
Register
The easy way to get, use and share data
Datasets
Organisations
About
Blog
Help
Search Datasets
Home
Datasets
Add Dataset
Import Data Package
Submit
Order by
Relevance
Name Ascending
Name Descending
Last Modified
Popular
Go
1 dataset found
Licenses:
Other (Open)
Tags:
web
Filter Results
A corpus of web crawl data composed of 5 billion web pages.
A corpus of web crawl data composed of 5 billion web pages. This data set is freely available on Amazon S3 at s3://aws-publicdatasets/common-crawl/crawl-002/ and formatted in...
application/download
You can also access this registry using the
API
(see
API Docs
).