Search CORE

2 research outputs found

Statistical language models for large vocabulary spontaneous speech recognition in Dutch

Author: Duchateau Jacques
Van hamme Hugo
Van Uytsel Nils Dong Hoon
Wambacq Patrick
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/09/2005
Field of study

In state-of-the-art large vocabulary automatic recognition systems, a large statistical language model is used, typically an N-gram. However in order to estimate this model, a large database of sentences or texts in the same style as the recognition task is needed. For spontaneous speech one doesn't dispose of such database since it should consist of accurate thus expensive orthographic transcriptions of spoken audio. This paper investigates how readily available large news paper corpora can be used to improve language models for spontaneous speech recognition although both language styles differ considerably. A technique is proposed that does a perplexity based automatic selection of appropriate news paper articles and that subsequently uses these texts in the language model estimation. Recognition experiments on spontaneous broadcast speech in Dutch showed significant improvements using this technique.Duchateau J., Van Uytsel D.H., Van hamme H., Wambacq P., ''Statistical language models for large vocabulary spontaneous speech recognition in Dutch'', Proceedings 9th European conference on speech communication and technology - Eurospeech 2005, pp. 1301-1304, September 4-8, 2005, Lisbon, Portugal.status: publishe

Lirias