2,092 research outputs found
Improvements to VTS feature enhancement
ABSTRACT By explicitly modelling the distortion of speech signals, model adaptation based on vector Taylor series (VTS) approaches have been shown to significantly improve the robustness of speech recognizers to environmental noise. However, the computational cost of VTS model adaptation (MVTS) methods hinders them from being widely used because they need to adapt all the HMM parameters for every utterance at runtime. In contrast, VTS feature enhancement (FVTS) methods have more computation advantages because they do not need multiple decoding passes and do not adapt all the HMM model parameters. In this paper, we propose two improvements to VTS feature enhancement: updating all of the environment distortion parameters and noise adaptive training of the front-end GMM. In addition, we investigate some other performance-related issues such as the selection of FVTS algorithms and the spectrum domain that MFCC is extracted from. As an important result of our investigation, we established the FVTS method can achieve comparable accuracy as the MVTS method with a smaller runtime cost. This makes FVTS method an ideal candidate for real world tasks
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
Noise adaptive training for subspace Gaussian mixture models
Noise adaptive training (NAT) is an effective approach to normalise the environmental distortions in the training data. This paper investigates the model-based NAT scheme using joint uncertainty decoding (JUD) for subspace Gaussian mixture models (SGMMs). A typical SGMM acoustic model has much larger number of surface Gaussian components, which makes it computationally infeasible to compensate each Gaussian explicitly. JUD tackles the problem by sharing the compensation parameters among the Gaussians and hence reduces the computational and memory demands. For noise adaptive training, JUD is reformulated into a generative model, which leads to an efficient expectation-maximisation (EM) based algorithm to update the SGMM acoustic model parameters. We evaluated the SGMMs with NAT on the Aurora 4 database, and obtained higher recognition accuracy compared to systems without adaptive training. Index Terms: adaptive training, noise robustness, joint uncertainty decoding, subspace Gaussian mixture model
Flavor Physics and the CKM Matrix: An Overview
I review the current status of our knowledge of CP violation and flavor
physics. I discuss where one should look for future improvements, and outline
the experimental and theoretical priorities of the field.Comment: 11 pages. Presentation at the Fifth KEK Topical Conference,
"Frontiers in Flavor Physics", November 20-22, 2001. References adde
Environmentally robust ASR front-end for deep neural network acoustic models
This paper examines the individual and combined impacts of various front-end approaches on the performance of deep neural network (DNN) based speech recognition systems in distant talking situations, where acoustic environmental distortion degrades the recognition performance. Training of a DNN-based acoustic model consists of generation of state alignments followed by learning the network parameters. This paper first shows that the network parameters are more sensitive to the speech quality than the alignments and thus this stage requires improvement. Then, various front-end robustness approaches to addressing this problem are categorised based on functionality. The degree to which each class of approaches impacts the performance of DNN-based acoustic models is examined experimentally. Based on the results, a front-end processing pipeline is proposed for efficiently combining different classes of approaches. Using this front-end, the combined effects of different classes of approaches are further evaluated in a single distant microphone-based meeting transcription task with both speaker independent (SI) and speaker adaptive training (SAT) set-ups. By combining multiple speech enhancement results, multiple types of features, and feature transformation, the front-end shows relative performance gains of 7.24% and 9.83% in the SI and SAT scenarios, respectively, over competitive DNN-based systems using log mel-filter bank features.This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.csl.2014.11.00
Speechâderived haptic stimulation enhances speech recognition in a multiâtalker background
Published: 03 October 2023Speech understanding, while effortless in quiet conditions, is challenging in noisy environments.
Previous studies have revealed that a feasible approach to supplement speech-in-noise (SiN)
perception consists in presenting speech-derived signals as haptic input. In the current study, we
investigated whether the presentation of a vibrotactile signal derived from the speech temporal
envelope can improve SiN intelligibility in a multi-talker background for untrained, normal-hearing
listeners. We also determined if vibrotactile sensitivity, evaluated using vibrotactile detection
thresholds, modulates the extent of audio-tactile SiN improvement. In practice, we measured
participantsâ speech recognition in a multi-talker noise without (audio-only) and with (audio-tactile)
concurrent vibrotactile stimulation delivered in three schemes: to the left or right palm, or to
both. Averaged across the three stimulation delivery schemes, the vibrotactile stimulation led to a
significant improvement of 0.41 dB in SiN recognition when compared to the audio-only condition.
Notably, there were no significant differences observed between the improvements in these delivery
schemes. In addition, audio-tactile SiN benefit was significantly predicted by participantsâ vibrotactile
threshold levels and unimodal (audio-only) SiN performance. The extent of the improvement afforded
by speech-envelope-derived vibrotactile stimulation was in line with previously uncovered vibrotactile
enhancements of SiN perception in untrained listeners with no known hearing impairment. Overall,
these results highlight the potential of concurrent vibrotactile stimulation to improve SiN recognition,
especially in individuals with poor SiN perception abilities, and tentatively more so with increasing
tactile sensitivity. Moreover, they lend support to the multimodal accounts of speech perception and
research on tactile speech aid devices.I. Sabina RÄutu is supported by the Fonds pour la formation Ă la recherche dans lâindustrie et lâagriculture (FRIA),
Fonds de la Recherche Scientifique (FRS-FNRS), Brussels, Belgium. Xavier De TiĂšge is Clinical Researcher at
the FRS-FNRS. This research project has been supported by the Fonds Erasme (Research convention âLes Voies
du Savoir 2â, Brussels, Belgium)
Investigating The Physics Case of Running a B-Factory at the Y(5S) Resonance
We discuss the physics case of a high luminosity B-Factory running at the
Y(5S) resonance. We show that the coherence of the B meson pairs is preserved
at this resonance, and that Bs can be well distinguished from Bd and charged B
mesons. These facts allow to cover the physics program of a traditional
B-Factory and, at the same time, to perform complementary measurements which
are not accessible at the Y(4S). In particular we show how, despite the
experimental limitations in performing time-dependent measurements of Bs
decays, the same experimental information can be extracted, in several cases,
from the determination of time-integrated observables. In addition, a few
examples of the potentiality in measuring rare Bs decays are given. Finally, we
discuss how the study of Bs meson will improve the constraints on New Physics
parameters in the Bs sector, in the context of the generalized Unitarity
Triangle analysis.Comment: 47 pages, 22 figure
- âŠ