2,382 research outputs found
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
Wavelet-based techniques for speech recognition
In this thesis, new wavelet-based techniques have been developed for the
extraction of features from speech signals for the purpose of automatic speech
recognition (ASR). One of the advantages of the wavelet transform over the short
time Fourier transform (STFT) is its capability to process non-stationary signals.
Since speech signals are not strictly stationary the wavelet transform is a better
choice for time-frequency transformation of these signals. In addition it has
compactly supported basis functions, thereby reducing the amount of
computation as opposed to STFT where an overlapping window is needed. [Continues.
Speech errors across the lifespan
Dell, Burger, and Svec (1997) proposed that the proportion of speech errors classified as anticipations (e.g., " moot and mouth ") can be predicted solely from the overall error rate, such that the greater the error rate, the lower the anticipatory proportion (AP) of errors. We report a study examining whether this effect applies to changes in error rates that occur developmentally and as a result of ageing. Speech errors were elicited from 8- and 11-year-old children, young adults, and older adults. The error rate decreased and the AP increased from children to young adults, but neither error rate nor AP differed significantly between young and older adults. In cases where fast speech resulted in a higher error rate than slow speech, the AP was lower. Thus, there was overall support for Dell et al.'s prediction from speech error data across the lifespan
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
In the intricate acoustic landscapes where speech intelligibility is
challenged by noise and reverberation, multichannel speech enhancement emerges
as a promising solution for individuals with hearing loss. Such algorithms are
commonly evaluated at the utterance level. However, this approach overlooks the
granular acoustic nuances revealed by phoneme-specific analysis, potentially
obscuring key insights into their performance. This paper presents an in-depth
phoneme-scale evaluation of 3 state-of-the-art multichannel speech enhancement
algorithms. These algorithms -- FasNet, MVDR, and Tango -- are extensively
evaluated across different noise conditions and spatial setups, employing
realistic acoustic simulations with measured room impulse responses, and
leveraging diversity offered by multiple microphones in a binaural hearing
setup. The study emphasizes the fine-grained phoneme-level analysis, revealing
that while some phonemes like plosives are heavily impacted by environmental
acoustics and challenging to deal with by the algorithms, others like nasals
and sibilants see substantial improvements after enhancement. These
investigations demonstrate important improvements in phoneme clarity in noisy
conditions, with insights that could drive the development of more personalized
and phoneme-aware hearing aid technologies.Comment: This is the preprint of the paper that we submitted to the Trends in
Hearing Journa
- …