english-gigaword

This is a recipe to train word n-gram language models using the newswire text provided in the English Gigaword corpus (1200M words of NYT, APW, AFE, XIE). It also prepares dictionaries needed to use the LMs with the HTK and Sphinx speech recognizers.

Download Data Package

Data and Resources

This dataset has no data

Additional Info

Field	Value
Source	http://www.keithv.com/software/giga/
Author	Keith Vertanen
Last Updated	October 10, 2013, 20:49 (UTC)
Created	January 24, 2011, 18:58 (UTC)