Search CORE

71 research outputs found

Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments

Author: M.L. Seltzer
R.M. Stern
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Author: Gannot Sharon
Girin Laurent
Horaud Radu
Li Xiaofei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/02/2018
Field of study

This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the

\ell_1

-norm to impose their spectral sparsity, with the constraint that the

\ell_2

-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Spatial features of reverberant speech: estimation and application to recognition and diarization

Author: Peso Pablo
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/11/2016
Field of study

Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed. An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced. Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.Open Acces

Lirias

Spiral - Imperial College Digital Repository

Dereverberation and Denoising Techniques for ASR Applications

Author: Fernando Santana Pacheco
Rui Seara
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Convolutive Blind Source Separation Methods

Author: Kjems Ulrik
Larsen Jan
Parra Lucas C.
Pedersen Michael Syskind
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks

CiteSeerX

Online Research Database In Technology