Search CORE

86,371 research outputs found

Subband analysis for robust speech recognition in the presence of car noise

Author: Erzin Engin Enis Cetin, A.
Yardimci Yasemin
Publication venue: IEEE, Piscataway, NJ, United States
Publication date: 01/01/1995
Field of study

In this paper, a new set of speech feature representations for robust speech recognition in the presence of car noise are proposed. These parameters are based on subband analysis of the speech signal. Line Spectral Frequency (LSF) representation of the Linear Prediction (LP) analysis in subbands and cepstral coefficients derived from subband analysis (SUBCEP) are introduced, and the performances of the new feature representations are compared to mel scale cepstral coefficients (MELCEP) in the presence of car noise. Subband analysis based parameters are observed to be more robust than the commonly employed MELCEP representations

Bilkent University Institutional Repository

Hilbert Envelope Based Features for Far-Field Speech Recognition

Author: Ganapathy Sriram
Hermansky Hynek
Thomas Samuel
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

Automatic speech recognition (ASR) systems, trained on speech signals from close-talking microphones, generally fail in recognizing far-field speech. In this paper, we present a Hilbert Envelope based feature extraction technique to alleviate the artifacts introduced by room reverberations. The proposed technique is based on modeling temporal envelopes of the speech signal in narrow sub-bands using Frequency Domain Linear Prediction (FDLP). ASR experiments on far-field speech using the proposed FDLP features show significant performance improvements when compared to other robust feature extraction techniques (average relative improvement of

43 \%

in word error rate)

Infoscience - École polytechnique fédérale de Lausanne

Recognition Of Reverberant Speech Using Frequency Domain Linear Prediction

Author: Ganapathy Sriram
Hermansky Hynek
Thomas Samuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2010
Field of study

Performance of a typical automatic speech recognition (ASR) system severely degrades when it encounters speech from reverberant environments. Part of the reason for this degradation is the feature extraction techniques that use analysis windows which are much shorter than typical room impulse responses. We present a feature extraction technique based on modeling temporal envelopes of the speech signal in narrow sub-bands using Frequency Domain Linear Prediction (FDLP). FDLP provides an all-pole approximation of the Hilbert envelope of the signal obtained by linear prediction on cosine transform of the signal. ASR experiments on speech data degraded with a number of room impulse responses (with varying degrees of distortion) show significant performance improvements for the proposed FDLP features when compared to other robust feature extraction techniques (average relative reduction of

24 \%

in word error rate). Similar improvements are also obtained for far-field data which contain natural reverberation in background noise. These results are achieved without any noticeable degradation in performance for clean speech

Infoscience - École polytechnique fédérale de Lausanne

Las fortalezas castellanas de la Orden de Calatrava en el siglo XII

Author: Ganapathy Sriram
Hermansky Hynek
Thomas Samuel
Publication venue: Ediciones Complutense
Publication date: 01/01/1993
Field of study

In this paper, we present a spectro-temporal feature extraction technique using sub-band Hilbert envelopes of relatively long segments of speech signal. Hilbert envelopes of the sub-bands are estimated using Frequency Domain Linear Prediction (FDLP). Spectral features are derived by integrating the sub-band Hilbert envelopes in short-term frames and the temporal features are formed by converting the FDLP envelopes into modulation frequency components. These are then combined at the phoneme posterior level and are used as the input features for a phoneme recognition system. In order to improve the robustness of the proposed features to telephone speech, the sub-band temporal envelopes are gain normalized prior to feature extraction. Phoneme recognition experiments on telephone speech in the HTIMIT database show significant performance improvements for the proposed features when compared to other robust feature techniques (average relative reduction of

11\%

in phoneme error rate)

Infoscience - École polytechnique fédérale de Lausanne

Biblos-e Archivo