-
KAIST silver standard corpus
KAIST silver standard corpus Availability: Freely Avalable Usage: Named Entity Recognition Status:Newly created-finished Description: We propose a novel method to... -
American National Corpus - Open Portion
This dataset has no description
-
gemet-annotated
Details about how this dataset was built are described in the article: Are SKOS concept schemes ready for multilingual retrieval applications? — Diana Tanase and Epaminondas... -
Meriterm Heart Failure Multilingual Terminology
Multilingual (English and French) Heart Failure Terminology linked with SNOMED-CT and ICD-10. Contains also mappings with UMLS and ICPC2. Each Term Entry has several lexical... -
The JMdict (Japanese-Multilingual Dictionary) project
About Overview: The JMdict (Japanese-Multilingual Dictionary) project has at its aim the compilation of a multilingual lexical database with Japanese as the pivot language. The... -
Multext-East
From the web site: Version 4 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian,... -
WordNet-RDF
RDF version of WordNet from Princeton -
PanLex
A lexical database documenting translations among lexemes of language varieties. -
ConceptNet
WordNet-like concept network developed at MIT ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually... -
xLiD-Lexica
Our xLiD-Lexica dataset in RDF (http://km.aifb.kit.edu/resources/xLiD-lexica.nt) contains about 300 million triples of cross-lingual groundings. It is extracted from Wikipedia... -
Wordnet
About From website: WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into... -
WikiWord Thesaurus Data
About Overview: The WikiWord-Thesaurus is a multilingual Thesaurus derived from Wikipedia by extracting lexical and semantic information. It was originally developed for a... -
Terminesp Linked Data
Lexicon Terminesp LD Spanish (spa) English (eng) German (deu) French (fra) Swedish (swe) Latin, Italian Availability: Freely Avalable Usage: Machine Translation,... -
TalkBank
About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of... -
Syntactic Reference Corpus of Medieval French (SRCMF)
The SRCMF contains the 15 Old French texts with about 280000 words. It has a high-quality manual annotation, based on a linguistically adequate dependency grammar. Annotation... -
Sanskrit English Lexicon
A Lexicon of Sanskrit to English -
SALDO
SALDO (Swedish Associative Thesaurus version 2) is an extensive electronic lexicon resource for modern Swedish written language. It is created for the purpose of language... -
The Rosetta Project
About From the about page: The Rosetta Project is a global collaboration of language specialists and native speakers working to build a publicly accessible digital library of... -
Pali English Lexicon
A lexicon from Pali to English. -
OPUS - an open source parallel corpus
OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the... -
OLiA Discourse
OLiA Discourse Extensions -
The DGT Multilingual Translation Memory of the Acquis Communautaire
As of November 2007, the European Commission's Directorate-General for Translation (DGT) made publicly accessible its multilingual Translation Memory for the Acquis... -
MetaShare metadata model
Ontology Metadata as LOD Availability: Freely Avalable Usage: Status:Newly created-in progress Description: LOD prelimnary version of the MetaShare metadata model.... -
linked hypernyms
This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are... -
Leipzig Corpora Collection (LCC)
Deutscher Wortschatz contains data generated from newspapers and web resources that are publicly available. The data were collected per language and encompass statistics about...