Search CORE

33 research outputs found

A System for Simultaneous Translation of Lectures and Speeches

Author: Fügen Christian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

This thesis realizes the first existing automatic system for simultaneous speech-to-speech translation. The focus of this system is the automatic translation of (technical oriented) lectures and speeches from English to Spanish, but the different aspects described in this thesis will also be helpful for developing simultaneous translation systems for other domains or languages

KITopen

Simultaneous Multispeaker Segmentation for Automatic Meeting Recognition

Author: Fügen Christian
Laskowski Kornel
Schultz Tanja
Publication venue
Publication date: 03/09/2007
Field of study

Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, and standard vocal activity detection algorithms for close-talk microphones have shown to be ineffective. This is primarily due to the problem of crosstalk, in which a participant’s speech appears on other participants ’ microphones, making it hard to attribute detected speech to its correct speaker. We describe an automatic multichannel segmentation system for meeting recognition, which accounts for both the observed acoustics and the inferred vocal activity states of all participants using joint multi-participant models. Our experiments show that this approach almost completely eliminates the crosstalk problem. Recent improvements to the baseline reduce the development set word error rate, achieved by a state-of-theart multi-pass speech recognition system, by 62 % relative to manual segmentation. We also observe significant performance improvements on unseen data

CiteSeerX

KITopen

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Maximum Entropy Language Modeling for Russian ASR

Author: Fügen Christian
Kilgour Kevin
Shin Evgeniy
Stüker Sebastian
Waibel Alex
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

Russian is a challenging language for automatic speech recognition systems due to its rich morphology. This rich morphology stems from Russian’s highly inflectional nature and the frequent use of preand suffixes. Also, Russian has a very free word order, changes in which are used to reflect connotations of the sentences. Dealing with these phenomena is rather difficult for traditional n-gram models. We therefore investigate in this paper the use of a maximum entropy language model for Russian whose features are specifically designed to deal with the inflections in Russian, as well as the loose word order. We combine this with a subword based language model in order to alleviate the problem of large vocabulary sizes necessary for dealing with highly inflecting languages. Applying the maximum entropy language model during re-scoring improves the word error rate of our recognition system by 1.2% absolute, while the use of the sub-word based language model reduces the vocabulary size from 120k to 40k and the OOV rate from 4.8% to 2.1%

KITopen

Efficient Handling of Multilingual Language Models

Author: Fügen Christian
Metze Florian
Schultz Tanja
Soltau Hagen
Stüker Sebastian
Publication venue
Publication date: 13/06/2008
Field of study

KITopen

International audienceWe introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio , which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art. Index Terms-unsupervised and semi-supervised learning , distant supervision, dataset, zero-and low resource ASR

Crossref

INRIA a CCSD electronic archive server

Tight Coupling of Speech Recognition and Dialog Management -- Dialog-Context Dependent Grammar . . .

Author: Christian Fügen et al.
Publication venue
Publication date
Field of study

In this paper we present our current work on a tight coupling of a speech recognizer with a dialog manager and our results by restricting the search space of our grammar based speech recognizer through the information given by the dialog manager. As a resul

CiteSeerX