2 research outputs found

    Statistical language models for large vocabulary spontaneous speech recognition in Dutch

    No full text
    In state-of-the-art large vocabulary automatic recognition systems, a large statistical language model is used, typically an N-gram. However in order to estimate this model, a large database of sentences or texts in the same style as the recognition task is needed. For spontaneous speech one doesn't dispose of such database since it should consist of accurate thus expensive orthographic transcriptions of spoken audio. This paper investigates how readily available large news paper corpora can be used to improve language models for spontaneous speech recognition although both language styles differ considerably. A technique is proposed that does a perplexity based automatic selection of appropriate news paper articles and that subsequently uses these texts in the language model estimation. Recognition experiments on spontaneous broadcast speech in Dutch showed significant improvements using this technique.Duchateau J., Van Uytsel D.H., Van hamme H., Wambacq P., ''Statistical language models for large vocabulary spontaneous speech recognition in Dutch'', Proceedings 9th European conference on speech communication and technology - Eurospeech 2005, pp. 1301-1304, September 4-8, 2005, Lisbon, Portugal.status: publishe
    corecore