Search for a Dataset - the Datahub

Add Dataset Import Data Package

The IBL Corpus

About The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile...
- GZ
WikiWord

About Overview: WikiWord is a system for building a multilingual Thesaurus by extracting lexical and semantic information from Wikipedia. It was originally developed for a...
The Speech Accent Archive

From website: The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read...
Spanish Verb Database

Fred Jehle, formerly a professor at Indiana University-Purdue University Fort Wayne, published approximately 600 verbs, fully conjugated in all moods and tenses, on his website...
POS Tagger for Romanian Language

We have developed a hidden Markov model-based part-of-speech tagger for the Romanian language. Our interactive web tool is located at...
Perseus Digital Library

Description Started in 1987 focusing on classics but has now expanded to other areas. Though containing a large amount of material site itself is focused on being a digital...
MOCHA-TIMIT

About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999. Purpose:...
Language Commons

This dataset has no description
ISO language, territory, currency codes and their translations

This is a set of ISO codes including those for country and currency collected together into a useful package by the Debian project. From the package page: This package provides...
- tar.bz2
ISO 639-3 - Codes for the Representation of Names of Languages

About ISO 639-3 is a list of three letter codes for languages: ISO 639-3 attempts to provide as complete an enumeration of languages as possible, including living, extinct,...
FSI Language Courses

About From website: Welcome to fsi-language-courses.com, the home for language courses developed by the Foreign Service Institute. These courses were developed by the United...
Europarl Parallel Corpus

Description Overview from home page: The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages:...
CoGrOO - a Portuguese Grammar Checker to OpenOffice.org

This dataset has no description

You can also access this registry using the API (see API Docs).

13 datasets found