Search CORE

6 research outputs found

A Particle Filter Compensation Approach to Robust Speech Recognition

Author: Mushtaq Aleem
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

Inference in Switching Linear Dynamical Systems Applied to Noise Robust Speech Recognition of Isolated Digits

Author: Mesot Bertrand
Publication venue: IDIAP Research Institute
Publication date: 11/02/2010
Field of study

Real world applications such as hands-free dialling in cars may have to perform recognition of spoken digits in potentially very noisy environments. Existing state-of-the-art solutions to this problem use feature-based Hidden Markov Models~(HMMs), with a preprocessing stage to clean the noisy signal. However, the effect that the noise has on the induced HMM features is difficult to model exactly and limits the performance of the HMM system. An alternative to feature-based HMMs is to model the clean speech waveform directly, which has the potential advantage that including an explicit model of additive noise is straightforward. One of the most simple model of the clean speech waveform is the autoregressive~(AR) process. Being too simple to cope with the nonlinearity of the speech signal, the AR~process is generally embedded into a more elaborate model, such as the Switching Autoregressive HMM~(SAR-HMM). In this thesis, we extend the SAR-HMM to jointly model the clean speech waveform and additive Gaussian white noise. This is achieved by using a Switching Linear Dynamical System~(SLDS) whose internal dynamics is autoregressive. On an isolated digit recognition task where utterances have been corrupted by additive Gaussian white noise, the proposed~SLDS outperforms a state-of-the-art HMM system. For more natural noise sources, at low signal to noise ratios~(SNRs), it is also significantly more accurate than a feature-based HMM~system. Inferring the clean waveform from the observed noisy signal with a~SLDS is formally intractable, resulting in many approximation strategies in the literature. In this thesis, we present the Expectation Correction~(EC) approximation. The algorithm has excellent numerical performance compared to a wide range of competing techniques, and provides a stable and accurate linear-time approximation which scales well to long time series such as those found in acoustic modelling. A fundamental issue faced by models based on AR~processes is that they are sensitive to variations in the amplitude of the signal. One way to overcome this limitation is to use Gain Adaptation~(GA) to adjust the amplitude by maximising the likelihood of the observed signal. However, adjusting model parameters without constraint may lead to overfitting when the models are sufficiently flexible. In this thesis, we propose a statistically principled alternative based on an exact Bayesian procedure in which priors are explicitly defined on the parameters of the underlying AR~process. Compared to~GA, the Bayesian approach enhances recognition accuracy at high~SNRs, but is slightly less accurate at low~SNRs

Infoscience - École polytechnique fédérale de Lausanne

Robust automatic transcription of lectures

Author: Wölfel Matthias
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic transcription of lectures is becoming an important task. Possible applications can be found in the fields of automatic translation or summarization, information retrieval, digital libraries, education and communication research. Ideally those systems would operate on distant recordings, freeing the presenter from wearing body-mounted microphones. This task, however, is surpassingly difficult, given that the speech signal is severely degraded by background noise and reverberation

KITopen

Directory of Open Access Books (DOAB)

Robust Automatic Transcription of Lectures

Author: Wölfel Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Die automatische Transkription von Vorträgen, Vorlesungen und Präsentationen wird immer wichtiger und ermöglicht erst die Anwendungen der automatischen Übersetzung von Sprache, der automatischen Zusammenfassung von Sprache, der gezielten Informationssuche in Audiodaten und somit die leichtere Zugänglichkeit in digitalen Bibliotheken. Im Idealfall arbeitet ein solches System mit einem Mikrofon das den Vortragenden vom Tragen eines Mikrofons befreit was der Fokus dieser Arbeit ist

KITopen