Search for a Dataset - the Datahub

Add Dataset Import Data Package

DBpedia abstract corpus

This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
- GZ
- text/turtle
Bibliography of Linguistic Literature (BLL) Thesaurus

The Thesaurus of the Bibliography of Linguistic Literature (BLL Thesaurus) represents a comprehensive bilingual vocabulary for indexing and documentation of linguistically...
- RDF
- HTML
FrameBase schema

FrameBase is a linked open knowledge base meant to uniformly represent a wide range of knowledge, tackling semantic heterogeneity among various sources of structured knowledge,...
- rdf/turtle
Universal Dependencies Treebank Finnish-FTB

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
- CoNLL-U
- RDF
Universal Dependencies Treebank Coptic

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
- CoNLL-U
- RDF
GeoWordNet

GeoWordNet is a semantic resource built from the full integration of WordNet, GeoNames and the Italian part of MultiWordNet. GeoWordNet Public Dataset contains 3,698,238...
- meta/void
- RDF
- meta/sitemap
- CSV
- example/rdf+xml
- HTML
- WordNet
Ontos News Portal

The Ontos News Portal extracts facts (objects as e. g. persons or organizations as well as relations between them, e. g. a person is working for an organization or living at a...
- text/turtle
- RDF
WordNet 3.0 (VU Amsterdam)

RDF conversion of Princeton's package:wordnet, version 3.0. With many links to package:w3c-wordnet, package:lexvo and the Dutch package:cornetto .
- HTML
- RDF
- XML
- meta/void
OLiA

The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models...
- HTML
- rdf, owl
- application/x-zip-compressed
- example/rdf+xml
ISOcat

ISO 12620 provides a framework for defining data categories compliant with the ISO/IEC 11179 family of standards. According to this model, each data category is assigned a...
- html, rdf, dcif
- example/rdf+xml
- text/ttl
OLAC Metadata

Metadata of linguistic resources participating in Open Language Archives Community.
- GZ
- RDF
CopyrightTermBank

Terminology on copyright and related concepts
- Linked Data
- text/turtle
- application/n-triples
- example/turtle
- PDF
TDS

Typological Database System ontology
- RDF
- HTML
KORE 50 NIF NER Corpus

KORE 50[1] (AIDA) is a subset of the larger AIDA corpus, which is based on the dataset of the CoNLL 2003 NER task. The dataset aims to capture hard to disambiguate mentions of...
- text/turtle
- PDF
IATE RDF

The IATE Dataset in RDF, converted from TBX
- TXT
LemonWiktionary

Lemon data extracted from Wiktionary
- example/rdf+xml
- xhtml, rdf/xml, turtle
- text/turtle
- HTML
Wikilinks RDF/NIF

The Wikilinks corpus is a coreference resolution corpus of very large scale. It contains over 40 million mentions of over 3 million entities. Mentions are manually labeled links...
- example/turtle
- GZ
- CSV
Parole/Simple 'lexinfo' Ontology & lexicons

The Parole/Simple 'lemon' Ontology is the OWL version of the Parole/Simple model (defined during the PAROLE LE2-4017 and SIMPLE LE4-8346 projects) once mapped to the lemon...
- rdf/turtle
- RDF
- ttl
News-100 NIF NER Corpus

This corpus comprises 100 German news articles from the online news platform news.de. All of the articles were published in the year of 2010 and contain the word Golf. This word...
- text/turtle
- PDF
RSS-500 NIF NER CORPUS

This corpus has been created using a dataset comprising a list of 1,457 RSS feeds as compiled in (Goldhahn et al. 2012). The list includes all major worldwide newspapers and a...
- text/turtle
- PDF
DBpedia Spotlight NIF NER Corpus

Based on P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia Spotlight: shedding light on the web of documents. In Proc. of the 7th Int. Conf. on Semantic Systems,...
- text/turtle
- PDF
Reuters-128 NIF NER Corpus

This English corpus is based on the well known Reuters-21578 corpus which contains economic news articles. In particular, we chose 128 articles containing at least one NE....
- text/turtle
- PDF
CLLD-PHOIBLE

PHOIBLE Online published by the CLLD project
- meta/void
- text/n3
CLLD-WALS

WALS Online published by the CLLD project
- meta/void
- text/n3
SALDO-RDF

SALDO, the Swedish Associative Thesaurus, a semantic lexicon in RDF.
- url
- GZ
- api/sparql

You can also access this registry using the API (see API Docs).

33 datasets found