Search CORE

1 research outputs found

Placing Structuring Elements In A Word Sequence For Generating New Statistical Language Models

Author: Günther Ruske
Karl Weilhammer
Publication venue
Publication date: 21/11/2007
Field of study

Class based #-gram language models have been applied successfully in speech technology. We will present an automatic method to improve #-gram language models by distributing structural elements in a new way in word sequences. Our algorithm works on textual data consisting of two different kinds of text elements, namely words and structural elements. The order of words will not be changed during the iterations. Only structural elements can be inserted or deleted by the algorithm between any two items in the data. Thus unseen #-grams will be interpolated by #-grams containing structural elements. We give a detailed description of the algorithm and present first results of a system trained on a small corpus.

CiteSeerX