Search CORE

11 research outputs found

Fuzzy Class Rescoring: A Part-Of-Speech Language Model

Author: P. Geutner
Publication venue
Publication date: 02/08/2007
Field of study

Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how to interpolate with a Part-of-Speech (POS) tag-based language model as example of a class-based model, where a word can be member of many different classes. Here the actual class membership of a word in the lattice becomes a hidden event of the A algorithm used for rescoring. A forward type of algorithm is defined as extension of the lattice rescorer to handle these hidden events in a mathematically sound fashion. Applying the mixture of viterbi and forward kind of rescoring procedure to the German Spontaneous Scheduling Task (GSST) yields some improvement in word accuracy. Above all, the resc..

CiteSeerX

KITopen

Using Morphology Towards Better Large-Vocabulary Speech Recognition Systems

Author: P. Geutner
Publication venue
Publication date
Field of study

To guarantee unrestricted natural language processing, state-of-the-art speech recognition systems require huge dictionaries that increase search space and result in performance degradations. This is especially true for languages where there do exist a large number of inflections and compound words such as German, Spanish, etc. One way to keep up decent recognition results with increasing vocabulary is the use of other base units than simply words. In this paper different decomposition methods originally based on morphological decomposition for the German language will be compared. Not only do they counteract the immense vocabulary growth with an increasing amount of training data, also the rate of out-of-vocabulary words, which worsens recognition performance significantly in German, is decreased. A smaller dictionary also leads to 30% speed improvement during the recognition process. Moreover even if the amount of available training data is quite huge it is often not enough to guaran..

CiteSeerX

Selection Criteria for Hypothesis Driven Lexical Adaptation

Author: A. Waibel
M. Finke
P. Geutner
Publication venue
Publication date: 01/01/1999
Field of study

Adapting the vocabulary of a speech recognizer to the utterance to be recognized has proven to be successful both in reducing high out-of-vocabulary as well as word error rates. This applies especially to languages that have a rapid vocabulary growth due to a large number of inflections and composita. This paper presents various adaptation methods within the Hypothesis Driven Lexical Adaptation (HDLA) framework which allow speech recognition on a virtually unlimited vocabulary. Selection criteria for the adaptation process are either based on morphological knowledge or distance measures at phoneme or grapheme level. Different methods are introduced for determining distances between phoneme pairs and for creating the large fallback lexicon the adapted vocabulary is chosen from. HDLA reduces the out-of-vocabulary-rate by 55 % for Serbo-Croatian, 35 % for German and 27 % for Turkish. The reduced out-of-vocabulary rate also decreases the word error rate by an absolute 4.1 % to 25.4 % on Serbo-Croatian broadcast news data. 2. THE SPEECH RECOGNITION ENGINE The speech recognition system used to perform all experiments for transcribing Serbo-Croatian broadcast news shows is trained on 12 hours of recorded speech of read newspaper articles and 18 hours of recorded broadcast news. It is based on 35 phones that are modeled by left-to-right HMMs. The preprocessing of the system consists of extracting an MFCC based feature vector every 10ms. The final feature vector is computed by a truncated LDA transformation of a concatenation of MFCCs and their first and second order derivatives. Vocal tract length normalization and cepstral mean subtraction are used to extenuate speaker and channel differences. The language models are trained on the hand-transcribed acoustic training data and an additional 11.8 million words of text data collected on the internet. Performance of the baseline system with an out-of-vocabulary rate of 8.7 % as well as results achieved by using HDLA are shown in table 1 below. 1

CiteSeerX

Crossref

Multilinguality in Speech and Spoken Language Systems

Author: A. Waibel
L. Mayfield Tomokiyo
M. Woszczyna
P. Geutner
T. Schultz
Publication venue
Publication date
Field of study

CiteSeerX

Use of Multiple Speech Recognition Units in a In-car Assistance System

Author: A. Brutti
A. Giacomini
F. Steffens
L. Cristoforetti
M. Maistrello
M. Matassoni
M. Omologo
P. Coletti
P. Geutner
P. Svaizer
Publication venue: Springer
Publication date
Field of study

This chapter presents an advanced dialogue system based on in-car hands-free voice interaction, conceived for btaining driving assistance and for accessing tourist information while driving. Part of the related activities aimed at developing this “Virtual Intelligent Codriver” are being conducted under the European VICO project. The architecture of the dialogue system is here presented, with a description of its main modules: Front-end Speech Processing, Recognition Engine, Natural Language Understanding, Dialogue Manager and Car Wide Web. The use of a set of HMM recognizers, running in parallel, is being investigated within this project in order to ensure low complexity, modularity, fast response, and to allow a real-time reconfiguration of the language models and grammars according to the dialogue context. A corpus of spontaneous speech interactions was collected at ITC-irst using the Wizard-of-Oz method in a real driving situation. Multiple recognition units specialized on geographical subdomains and simpler language models were experimented using the resulting corpus. This investigation shows that, in presence of large lists of names (e.g. cities, streets, hotels), the choice of the output with maximum likelihood among the active units, although a simple approach, provides better results than the use of a single comprehensive language mode

Archivio della ricerca - Fondazione Bruno Kessler

Janus: Towards Multilingual Spoken Language Translation

Author: B. Suhm P. Geutner, T. Kemp, A. Lavie, L. Mayfield, A. E. McNair, I. Rogina, T. Schultz, T. Sloboda
Publication venue
Publication date
Field of study

In our effort to build spoken language translation systems we have extended our JANUS system to process spontaneous human-human dialogs in a new domain, two people trying to schedule a meeting. Trained on an initial database JANUS-2 is able to translate English and German spoken input in either English, German, Spanish, Japanese or Korean output. To tackle the difficulty of spontaneous human-human dialogs we improved the JANUS-2 recognizer along its three knowledgesourcesacousticmodels, dictionary andlanguage models. We developed a robust translation system which performs semantic rather than syntactic analysis and thus is particulary suited to processing spontaneous speech. We describe repair methods to recover from recognition errors

CiteSeerX

Janus: Towards Multilingual Spoken Language Translation

Author: A. E. Mcnair
A. Lavie
A. Waibel
B. Suhm
I. Rogina
L. Mayfield
M. Woszczyna
P. Geutner
T. Kemp
T. Schultz
T. Sloboda
W. Ward
Publication venue
Publication date
Field of study

CiteSeerX