73,787 research outputs found

    Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

    Full text link
    Speech recognition in noisy and channel distorted scenarios is often challenging as the current acoustic modeling schemes are not adaptive to the changes in the signal distribution in the presence of noise. In this work, we develop a novel acoustic modeling framework for noise robust speech recognition based on relevance weighting mechanism. The relevance weighting is achieved using a sub-network approach that performs feature selection. A relevance sub-network is applied on the output of first layer of a convolutional network model operating on raw speech signals while a second relevance sub-network is applied on the second convolutional layer output. The relevance weights for the first layer correspond to an acoustic filterbank selection while the relevance weights in the second layer perform modulation filter selection. The model is trained for a speech recognition task on noisy and reverberant speech. The speech recognition experiments on multiple datasets (Aurora-4, CHiME-3, VOiCES) reveal that the incorporation of relevance weighting in the neural network architecture improves the speech recognition word error rates significantly (average relative improvements of 10% over the baseline systems)Comment: arXiv admin note: text overlap with arXiv:2001.0706

    Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances.

    Get PDF
    Wang, Ning.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 106-115).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction to Speech and Speaker Recognition --- p.1Chapter 1.2 --- Difficulties and Challenges of Speaker Authentication --- p.6Chapter 1.3 --- Objectives and Thesis Outline --- p.7Chapter 2 --- Speaker Recognition System --- p.10Chapter 2.1 --- Baseline Speaker Recognition System Overview --- p.10Chapter 2.1.1 --- Feature Extraction --- p.12Chapter 2.1.2 --- Pattern Generation and Classification --- p.24Chapter 2.2 --- Performance Evaluation Metric for Different Speaker Recognition Tasks --- p.30Chapter 2.3 --- Robustness of Speaker Recognition System --- p.30Chapter 2.3.1 --- Speech Corpus: CU2C --- p.30Chapter 2.3.2 --- Noise Database: NOISEX-92 --- p.34Chapter 2.3.3 --- Mismatched Training and Testing Conditions --- p.35Chapter 2.4 --- Summary --- p.37Chapter 3 --- Speaker Recognition System using both Vocal Tract and Vocal Source Features --- p.38Chapter 3.1 --- Speech Production Mechanism --- p.39Chapter 3.1.1 --- Speech Production: An Overview --- p.39Chapter 3.1.2 --- Acoustic Properties of Human Speech --- p.40Chapter 3.2 --- Source-filter Model and Linear Predictive Analysis --- p.44Chapter 3.2.1 --- Source-filter Speech Model --- p.44Chapter 3.2.2 --- Linear Predictive Analysis for Speech Signal --- p.46Chapter 3.3 --- Vocal Tract Features --- p.51Chapter 3.4 --- Vocal Source Features --- p.52Chapter 3.4.1 --- Source Related Features: An Overview --- p.52Chapter 3.4.2 --- Source Related Features: Technical Viewpoints --- p.54Chapter 3.5 --- Effects of Noises on Speech Properties --- p.55Chapter 3.6 --- Summary --- p.61Chapter 4 --- Estimation of Robust Acoustic Features for Speaker Discrimination --- p.62Chapter 4.1 --- Robust Speech Techniques --- p.63Chapter 4.1.1 --- Noise Resilience --- p.64Chapter 4.1.2 --- Speech Enhancement --- p.64Chapter 4.2 --- Spectral Subtractive-Type Preprocessing --- p.65Chapter 4.2.1 --- Noise Estimation --- p.66Chapter 4.2.2 --- Spectral Subtraction Algorithm --- p.66Chapter 4.3 --- LP Analysis of Noisy Speech --- p.67Chapter 4.3.1 --- LP Inverse Filtering: Whitening Process --- p.68Chapter 4.3.2 --- Magnitude Response of All-pole Filter in Noisy Condition --- p.70Chapter 4.3.3 --- Noise Spectral Reshaping --- p.72Chapter 4.4 --- Distinctive Vocal Tract and Vocal Source Feature Extraction . . --- p.73Chapter 4.4.1 --- Vocal Tract Feature Extraction --- p.73Chapter 4.4.2 --- Source Feature Generation Procedure --- p.75Chapter 4.4.3 --- Subband-specific Parameterization Method --- p.79Chapter 4.5 --- Summary --- p.87Chapter 5 --- Speaker Recognition Tasks & Performance Evaluation --- p.88Chapter 5.1 --- Speaker Recognition Experimental Setup --- p.89Chapter 5.1.1 --- Task Description --- p.89Chapter 5.1.2 --- Baseline Experiments --- p.90Chapter 5.1.3 --- Identification and Verification Results --- p.91Chapter 5.2 --- Speaker Recognition using Source-tract Features --- p.92Chapter 5.2.1 --- Source Feature Selection --- p.92Chapter 5.2.2 --- Source-tract Feature Fusion --- p.94Chapter 5.2.3 --- Identification and Verification Results --- p.95Chapter 5.3 --- Performance Analysis --- p.98Chapter 6 --- Conclusion --- p.102Chapter 6.1 --- Discussion and Conclusion --- p.102Chapter 6.2 --- Suggestion of Future Work --- p.10

    Very Deep Convolutional Neural Networks for Robust Speech Recognition

    Full text link
    This paper describes the extension and optimization of our previous work on very deep convolutional neural networks (CNNs) for effective recognition of noisy speech in the Aurora 4 task. The appropriate number of convolutional layers, the sizes of the filters, pooling operations and input feature maps are all modified: the filter and pooling sizes are reduced and dimensions of input feature maps are extended to allow adding more convolutional layers. Furthermore appropriate input padding and input feature map selection strategies are developed. In addition, an adaptation framework using joint training of very deep CNN with auxiliary features i-vector and fMLLR features is developed. These modifications give substantial word error rate reductions over the standard CNN used as baseline. Finally the very deep CNN is combined with an LSTM-RNN acoustic model and it is shown that state-level weighted log likelihood score combination in a joint acoustic model decoding scheme is very effective. On the Aurora 4 task, the very deep CNN achieves a WER of 8.81%, further 7.99% with auxiliary feature joint training, and 7.09% with LSTM-RNN joint decoding.Comment: accepted by SLT 201

    Robust Speech Detection for Noisy Environments

    Get PDF
    This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM) to improve speech recognition systems in stationary and non-stationary noise environments: inside motor vehicles (like cars or planes) or inside buildings close to high traffic places (like in a control tower for air traffic control (ATC)). In these environments, there is a high stationary noise level caused by vehicle motors and additionally, there could be people speaking at certain distance from the main speaker producing non-stationary noise. The VAD presented in this paper is characterized by a new front-end and a noise level adaptation process that increases significantly the VAD robustness for different signal to noise ratios (SNRs). The feature vector used by the VAD includes the most relevant Mel Frequency Cepstral Coefficients (MFCC), normalized log energy and delta log energy. The proposed VAD has been evaluated and compared to other well-known VADs using three databases containing different noise conditions: speech in clean environments (SNRs mayor que 20 dB), speech recorded in stationary noise environments (inside or close to motor vehicles), and finally, speech in non stationary environments (including noise from bars, television and far-field speakers). In the three cases, the detection error obtained with the proposed VAD is the lowest for all SNRs compared to AceroÂżs VAD (reference of this work) and other well-known VADs like AMR, AURORA or G729 annex b
    • …
    corecore