Search CORE

7 research outputs found

Introduction

Author: J. Belder De
J.F. Gemmeke
N. Oostdijk
V. Vandeghinste
Publication venue: Springer Berlin Heidelberg
Publication date
Field of study

Crossref

Springer - Publisher Connector

Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

Author: Peter Spyns Jan Odijk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

Directory of Open Access Books (DOAB)

Reducing speech recognition time and memory use by means of compound (de-)composition

Author: Martens Jean-Pierre
Réveil Bert
Publication venue: STW Technology Foundation
Publication date: 01/01/2008
Field of study

This paper tackles the problem of Out Of Vocabulary words in Automatic Speech Transcription applications for a compound language (Dutch). A seemingly attractive way to reduce the amount of OOV words in compound languages is to extend the AST system with a compound (de-)composition module. However, thus far, successful implementations of this approach are rather scarce. We developed a novel data driven compound (de-)composition module and tested it in two different AST experiments. For equal lexicon sizes, we see that our compound processor lowers the OOV rate. Moreover we are able to transform that gain in OOV rate into a reduction of the Word Error Rate of the transcription system. Using our approach we built a system with an 84K lexicon that performs as accurately as a baseline system with a 168K lexicon, but our system is 5-6% faster and requires about 50% less storage for the lexical component, even though this component is encoded in an optimal way (prefix-suffix tree compression)

Ghent University Academic Bibliography