-
Open Bantu isiXhosa Lexicon
The isiXhosa Lexicon is an RDF dataset that consists of lexical and morphological data and English translations that are linked to WordNet RDF. The data is based on tabular noun... -
Ontos News Portal
The Ontos News Portal extracts facts (objects as e. g. persons or organizations as well as relations between them, e. g. a person is working for an organization or living at a... -
OLiA
The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models... -
ISOcat
ISO 12620 provides a framework for defining data categories compliant with the ISO/IEC 11179 family of standards. According to this model, each data category is assigned a... -
TDS
Typological Database System ontology -
General Ontology of Linguistic Description
GOLD is an ontology for descriptive linguistics. -
IATE RDF
The IATE Dataset in RDF, converted from TBX -
LemonWiktionary
Lemon data extracted from Wiktionary -
American National Corpus - Open Portion
This dataset has no description
-
Multext-East
From the web site: Version 4 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian,... -
WordNet-RDF
RDF version of WordNet from Princeton -
PanLex
A lexical database documenting translations among lexemes of language varieties. -
ConceptNet
WordNet-like concept network developed at MIT ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually... -
WikiWord Thesaurus Data
About Overview: The WikiWord-Thesaurus is a multilingual Thesaurus derived from Wikipedia by extracting lexical and semantic information. It was originally developed for a... -
TalkBank
About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of... -
Syntactic Reference Corpus of Medieval French (SRCMF)
The SRCMF contains the 15 Old French texts with about 280000 words. It has a high-quality manual annotation, based on a linguistically adequate dependency grammar. Annotation... -
The Rosetta Project
About From the about page: The Rosetta Project is a global collaboration of language specialists and native speakers working to build a publicly accessible digital library of... -
MetaShare metadata model
Ontology Metadata as LOD Availability: Freely Avalable Usage: Status:Newly created-in progress Description: LOD prelimnary version of the MetaShare metadata model.... -
linked hypernyms
This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are... -
Leipzig Corpora Collection (LCC)
Deutscher Wortschatz contains data generated from newspapers and web resources that are publicly available. The data were collected per language and encompass statistics about... -
French TimeBank
The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language.... -
Automated Similarity Judgment Program lexical data
ASJP collects 40 words from 5500 languages in a simplified phonetic representation. More background can be found at http://email.eva.mpg.de/~wichmann/ASJPHomePage.htm -
Phonetics Information Base and Lexicon (PHOIBLE)
Phonetics Information Base and Lexicon (PHOIBLE) is a data set of phonological inventories with additional linguistic and non-linguistic information. -
Linked Old Germanic Dictionaries
Lexical resources (word lists, etymological dictionaries) for Germanic languages in different historical stages: pre 1100 (incl. Gothic, Old High German, Old English),... -
Glottolog
Glottolog provides information about descriptive literature for all the world's languages. It also provides a language classification as well as knowledge bases for names,...