2,092 research outputs found

    Improvements to VTS feature enhancement

    Get PDF
    ABSTRACT By explicitly modelling the distortion of speech signals, model adaptation based on vector Taylor series (VTS) approaches have been shown to significantly improve the robustness of speech recognizers to environmental noise. However, the computational cost of VTS model adaptation (MVTS) methods hinders them from being widely used because they need to adapt all the HMM parameters for every utterance at runtime. In contrast, VTS feature enhancement (FVTS) methods have more computation advantages because they do not need multiple decoding passes and do not adapt all the HMM model parameters. In this paper, we propose two improvements to VTS feature enhancement: updating all of the environment distortion parameters and noise adaptive training of the front-end GMM. In addition, we investigate some other performance-related issues such as the selection of FVTS algorithms and the spectrum domain that MFCC is extracted from. As an important result of our investigation, we established the FVTS method can achieve comparable accuracy as the MVTS method with a smaller runtime cost. This makes FVTS method an ideal candidate for real world tasks

    A Subband-Based SVM Front-End for Robust ASR

    Full text link
    This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

    Noise adaptive training for subspace Gaussian mixture models

    Get PDF
    Noise adaptive training (NAT) is an effective approach to normalise the environmental distortions in the training data. This paper investigates the model-based NAT scheme using joint uncertainty decoding (JUD) for subspace Gaussian mixture models (SGMMs). A typical SGMM acoustic model has much larger number of surface Gaussian components, which makes it computationally infeasible to compensate each Gaussian explicitly. JUD tackles the problem by sharing the compensation parameters among the Gaussians and hence reduces the computational and memory demands. For noise adaptive training, JUD is reformulated into a generative model, which leads to an efficient expectation-maximisation (EM) based algorithm to update the SGMM acoustic model parameters. We evaluated the SGMMs with NAT on the Aurora 4 database, and obtained higher recognition accuracy compared to systems without adaptive training. Index Terms: adaptive training, noise robustness, joint uncertainty decoding, subspace Gaussian mixture model

    Flavor Physics and the CKM Matrix: An Overview

    Get PDF
    I review the current status of our knowledge of CP violation and flavor physics. I discuss where one should look for future improvements, and outline the experimental and theoretical priorities of the field.Comment: 11 pages. Presentation at the Fifth KEK Topical Conference, "Frontiers in Flavor Physics", November 20-22, 2001. References adde

    Environmentally robust ASR front-end for deep neural network acoustic models

    Get PDF
    This paper examines the individual and combined impacts of various front-end approaches on the performance of deep neural network (DNN) based speech recognition systems in distant talking situations, where acoustic environmental distortion degrades the recognition performance. Training of a DNN-based acoustic model consists of generation of state alignments followed by learning the network parameters. This paper first shows that the network parameters are more sensitive to the speech quality than the alignments and thus this stage requires improvement. Then, various front-end robustness approaches to addressing this problem are categorised based on functionality. The degree to which each class of approaches impacts the performance of DNN-based acoustic models is examined experimentally. Based on the results, a front-end processing pipeline is proposed for efficiently combining different classes of approaches. Using this front-end, the combined effects of different classes of approaches are further evaluated in a single distant microphone-based meeting transcription task with both speaker independent (SI) and speaker adaptive training (SAT) set-ups. By combining multiple speech enhancement results, multiple types of features, and feature transformation, the front-end shows relative performance gains of 7.24% and 9.83% in the SI and SAT scenarios, respectively, over competitive DNN-based systems using log mel-filter bank features.This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.csl.2014.11.00

    Speech‑derived haptic stimulation enhances speech recognition in a multi‑talker background

    Get PDF
    Published: 03 October 2023Speech understanding, while effortless in quiet conditions, is challenging in noisy environments. Previous studies have revealed that a feasible approach to supplement speech-in-noise (SiN) perception consists in presenting speech-derived signals as haptic input. In the current study, we investigated whether the presentation of a vibrotactile signal derived from the speech temporal envelope can improve SiN intelligibility in a multi-talker background for untrained, normal-hearing listeners. We also determined if vibrotactile sensitivity, evaluated using vibrotactile detection thresholds, modulates the extent of audio-tactile SiN improvement. In practice, we measured participants’ speech recognition in a multi-talker noise without (audio-only) and with (audio-tactile) concurrent vibrotactile stimulation delivered in three schemes: to the left or right palm, or to both. Averaged across the three stimulation delivery schemes, the vibrotactile stimulation led to a significant improvement of 0.41 dB in SiN recognition when compared to the audio-only condition. Notably, there were no significant differences observed between the improvements in these delivery schemes. In addition, audio-tactile SiN benefit was significantly predicted by participants’ vibrotactile threshold levels and unimodal (audio-only) SiN performance. The extent of the improvement afforded by speech-envelope-derived vibrotactile stimulation was in line with previously uncovered vibrotactile enhancements of SiN perception in untrained listeners with no known hearing impairment. Overall, these results highlight the potential of concurrent vibrotactile stimulation to improve SiN recognition, especially in individuals with poor SiN perception abilities, and tentatively more so with increasing tactile sensitivity. Moreover, they lend support to the multimodal accounts of speech perception and research on tactile speech aid devices.I. Sabina Răutu is supported by the Fonds pour la formation à la recherche dans l’industrie et l’agriculture (FRIA), Fonds de la Recherche Scientifique (FRS-FNRS), Brussels, Belgium. Xavier De Tiùge is Clinical Researcher at the FRS-FNRS. This research project has been supported by the Fonds Erasme (Research convention “Les Voies du Savoir 2”, Brussels, Belgium)

    Investigating The Physics Case of Running a B-Factory at the Y(5S) Resonance

    Get PDF
    We discuss the physics case of a high luminosity B-Factory running at the Y(5S) resonance. We show that the coherence of the B meson pairs is preserved at this resonance, and that Bs can be well distinguished from Bd and charged B mesons. These facts allow to cover the physics program of a traditional B-Factory and, at the same time, to perform complementary measurements which are not accessible at the Y(4S). In particular we show how, despite the experimental limitations in performing time-dependent measurements of Bs decays, the same experimental information can be extracted, in several cases, from the determination of time-integrated observables. In addition, a few examples of the potentiality in measuring rare Bs decays are given. Finally, we discuss how the study of Bs meson will improve the constraints on New Physics parameters in the Bs sector, in the context of the generalized Unitarity Triangle analysis.Comment: 47 pages, 22 figure
    • 

    corecore