-
Neurocommons text mining pilot
About The complete dataset is composed of a set of smaller datasets. Each download is in one of two formats: (1) WARC or (2) tar.gz. You can read about the WARC format by... -
NeuroCommons
From the website: The NeuroCommons project seeks to make all scientific research materials - research articles, annotations, data, physical materials - as available and as... -
The Mondial Database
From home page: The MONDIAL database has been compiled from geographical Web data sources listed below: CIA World Factbook, a predecessor of Global Statistics which has been... -
MeSH titles
Data exposed: Extracted from 2007 Medline baseline distribution Size of dump and data set: 670 MB Notes: contact Medline for use terms -
MeSH pairs
Data exposed: NLM 2007 MeSH descriptor/qualifier pairs Size of dump and data set: 13 MB Openness: OPEN See http://www.nlm.nih.gov/mesh/termscon.html (basically attribution with... -
MeSH headings
About Data exposed: List of all associations of MeSH headings to papers indexed by Medline extracted from 2007 Medline baseline distribution Size of dump and data set: 758 MB... -
Linked ISO 3166-2 Data
About Linked ISO 3166-2 Data. ISO-3166-2 gives codes for countries and their principal subdivisions. Openness Published under CC0. (Where is this specified?) -
Homologene
Data exposed: what? Size of dump and data set: 626 KB Notes: NCBI Copyright and Disclaimers -
Historical Events Markup Language
Title: Historical Event Markup Language Description Historical Event Markup and Linking Project (Heml) provides an XML schema for historical events and a Java Web app which... -
GO annotations from National Center for Biotechnology Information (NCBI) and ...
Data exposed: GO annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI) Size of dump and data set: 73 MB Openness... -
Freebase RDF Store
Duplicate of package:freebase Data exposed: Freebase Views of Freebase Topics following the principles of Linked Data. The dataset extractions contain aggregated data from:... -
Fly-TED
Data exposed: derived from data published by www.fly-ted.org and provides metadata on images depicting in situ hybridisation in D. melanogaster testes. Size of dump and data... -
FlyAtlas
Data exposed: FlyAtlas and Affy D2 probe-to-gene Size of dump and data set: size? Notes: also found in the of SPARQL Endpoints -
EU European Statistical Information Service
Description "Eurostat’s mission is to provide the European Union with a high-quality statistical information service." Very large amount of data on a wide variety of European... -
Entrez Gene Extract
Data exposed: Entrez Gene Extract from [ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz] Size of dump and data set: 5.6 MB Notes: NCBI Copyright and Disclaimers -
Entrez Gene
About Data exposed: Select fields from Entrez Gene records Size of dump and data set: 7.7 MB Notes: NCBI Copyright and Disclaimers Openness Data appears to be in public domain.... -
DOAP Store
About Data exposed: provides daily generated dumps with all its DOAP project descriptions Size of dump and data set: size? Notes: 2009-05-24: Both files seem to be empty - hg... -
DOAPspace
Data exposed: All 55,000+ DOAP profiles available as RDF/XML DOAP. This includes all DOAP created by doapspace and all DOAP spidered. Size of dump and data set: size? Notes:... -
DMOZ RDF Dump
Data exposed: DMOZ Size of dump and data set: size? Openness: OPEN (?) Use Open Directory License which is, in essence, open (may be some wrinkles about updates). -
Open Directory Project (ODP)
From about page: The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community...