Search CORE

2,382 research outputs found

A Subband-Based SVM Front-End for Robust ASR

Author: Ager Matthew
Cvetkovic Zoran
Sollich Peter
Yousafzai Jibran
Publication venue
Publication date: 24/12/2013
Field of study

This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

arXiv.org e-Print Archive

King's Research Portal

Wavelet-based techniques for speech recognition

Author: Omar Farooq (7204418)
Publication venue
Publication date: 01/01/2002
Field of study

In this thesis, new wavelet-based techniques have been developed for the extraction of features from speech signals for the purpose of automatic speech recognition (ASR). One of the advantages of the wavelet transform over the short time Fourier transform (STFT) is its capability to process non-stationary signals. Since speech signals are not strictly stationary the wavelet transform is a better choice for time-frequency transformation of these signals. In addition it has compactly supported basis functions, thereby reducing the amount of computation as opposed to STFT where an overlapping window is needed. [Continues.

Loughborough University Institutional Repository

Speech errors across the lifespan

Author: Maylor Elizabeth A.
Vousden Janet I.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2006
Field of study

Dell, Burger, and Svec (1997) proposed that the proportion of speech errors classified as anticipations (e.g., " moot and mouth ") can be predicted solely from the overall error rate, such that the greater the error rate, the lower the anticipatory proportion (AP) of errors. We report a study examining whether this effect applies to changes in error rates that occur developmentally and as a result of ageing. Speech errors were elicited from 8- and 11-year-old children, young adults, and older adults. The error rate decreased and the AP increased from children to young adults, but neither error rate nor AP differed significantly between young and older adults. In cases where fast speech resulted in a higher error rate than slow speech, the AP was lower. Thus, there was overall support for Dell et al.'s prediction from speech error data across the lifespan

Warwick Research Archives Portal Repository

Adaptation of the human auditory cortex to changing background noise

Author: Herrero J. L.
Khalighinejad B.
Mehta A. D.
Mesgarani N.
Publication venue: Donald and Barbara Zucker School of Medicine Academic Works
Publication date: 01/01/2019
Field of study

Hofstra Northwell Academic Works (Hofstra Northwell School of Medicine)

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms

Author: Magron Paul
Monir Nasser-Eddine
Serizel Romain
Publication venue
Publication date: 24/01/2024
Field of study

In the intricate acoustic landscapes where speech intelligibility is challenged by noise and reverberation, multichannel speech enhancement emerges as a promising solution for individuals with hearing loss. Such algorithms are commonly evaluated at the utterance level. However, this approach overlooks the granular acoustic nuances revealed by phoneme-specific analysis, potentially obscuring key insights into their performance. This paper presents an in-depth phoneme-scale evaluation of 3 state-of-the-art multichannel speech enhancement algorithms. These algorithms -- FasNet, MVDR, and Tango -- are extensively evaluated across different noise conditions and spatial setups, employing realistic acoustic simulations with measured room impulse responses, and leveraging diversity offered by multiple microphones in a binaural hearing setup. The study emphasizes the fine-grained phoneme-level analysis, revealing that while some phonemes like plosives are heavily impacted by environmental acoustics and challenging to deal with by the algorithms, others like nasals and sibilants see substantial improvements after enhancement. These investigations demonstrate important improvements in phoneme clarity in noisy conditions, with insights that could drive the development of more personalized and phoneme-aware hearing aid technologies.Comment: This is the preprint of the paper that we submitted to the Trends in Hearing Journa

arXiv.org e-Print Archive