Search for a Dataset - the Datahub

Add Dataset Import Data Package

DBpedia abstract Dutch corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
DBpedia abstract Japanese corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
DBpedia abstract Italian corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
DBpedia abstract French corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
DBpedia abstract Spanish corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
DBpedia abstract English corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
DBpedia abstract German corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
Zhishi.me

Structured data extracted and integrated from three major web-based Chinese-language encyclopaedias: Chinese Wikipedia Hudong Baike Baidu Baike Each page is available in an...
- api/sparql
- example/rdf+xml
CopyrightTermBank

Terminology on copyright and related concepts
- Linked Data
- text/turtle
- application/n-triples
- example/turtle
- PDF
USAGE review corpus

This corpus consists of sentiment annotations of Amazon reviews for different product categories in the languages German and English. The reviews themselves are not part of this...
- example
- text/ntriples
- RDF
- api/sparql
TDS

Typological Database System ontology
- RDF
- HTML
Linguistic Metadata (LIME) vocabulary

LIME (LInguistic MEtadata) is a vocabulary for expressing linguistic metadata about linguistic resources and linguistically grounded datasets. The metadata vocabulary has been...
- HTML
- RDF
MASC-BN-NIF

This dataset contains the MASC 3.0 corpus, a large English corpus covering a wide range of genres of written and spoken text, enhanced with semantic annotations, both word...
- tar.gz
General Ontology of Linguistic Description

GOLD is an ontology for descriptive linguistics.
- OWL
RDF UK_ES

RDF version of the bilingual dictionary UA-ES. The original dataset (in CSV) comes from...
- rdf/turtle
- RDF
- ZIP
RDF EN-GB_UK-UA

RDF version of the bilingual dictionary EN-UA. The original dataset (in CSV) comes from...
- rdf/turtle
- meta/rdf-schema
- ZIP
IWN

This is the dataset corresponding to the ItalWordNet as created at the Institute of Computational Linguistic "A. Zampolli" in Pisa. The resource contains single instances such...
- RDF
- tar.gz
Open Multilingual Wordnet

Documentation of and links to data for wordnets in 20 languages (Albanian, Arabic, Danish, English, Persian, Finnish, French, Hebrew, Italian, Japanese, Basque, Catalan,...
- HTML
- ZIP
EuroSentiment

Gabriela Vulcu, Raul Lario Monje, Mario Munoz, Paul Buitelaar and Carlos A. Iglesias (2014), Linked-Data based Domain-Specific Sentiment Lexicons, In: Proceedings of the 3rd...
- api/sparql
- n-quads
LemonWiktionary

Lemon data extracted from Wiktionary
- example/rdf+xml
- xhtml, rdf/xml, turtle
- text/turtle
- HTML
OmegaWiki

About From website: A collaborative project to produce a free, multilingual resource in every language, with lexicological, terminological and thesaurus information.
- sql
KAIST silver standard corpus

KAIST silver standard corpus Availability: Freely Avalable Usage: Named Entity Recognition Status:Newly created-finished Description: We propose a novel method to...
- HTML
- TXT
American National Corpus - Open Portion

This dataset has no description
- JAR
- ZIP
- GZ
SIMPLE

This dataset contains the conversion of the Italian SIMPLE lexicon in different formats including RDF, TTL and a Lemon version of lexical entries with their pointers to senses.
- RDF
- JSON
- TXT
- text/turtle
gemet-annotated

Details about how this dataset was built are described in the article: Are SKOS concept schemes ready for multilingual retrieval applications? — Diana Tanase and Epaminondas...
- RDF

You can also access this registry using the API (see API Docs).

121 datasets found