1,908 research outputs found

    Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

    Full text link
    Far-field speech recognition in noisy and reverberant conditions remains a challenging problem despite recent deep learning breakthroughs. This problem is commonly addressed by acquiring a speech signal from multiple microphones and performing beamforming over them. In this paper, we propose to use a recurrent neural network with long short-term memory (LSTM) architecture to adaptively estimate real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions which results in a set of timevarying room impulse responses. The LSTM adaptive beamformer is jointly trained with a deep LSTM acoustic model to predict senone labels. Further, we use hidden units in the deep LSTM acoustic model to assist in predicting the beamforming filter coefficients. The proposed system achieves 7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real evaluation set.Comment: in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    A Parametric Replay-Based Framework for Underwater Acoustic Communication Channel Simulation

    No full text
    International audienceThis paper lays the foundation of an underwater acoustic channel simulation methodology that is halfway between parametric modeling and stochastic replay of at-sea measurements of channel impulse responses. The motivation behind this approach is to extend the scope of use of replay-based methods by allowing some parameterization of the channel properties while complying with some level of realism. Based on a relative entropy minimization between the original channel impulse response and the simulated one, the idea is to deliberately distort the original channel statistics in order to meet some specified constraints

    Efficient Noise Suppression for Robust Speech Recognition

    Get PDF
    Electrical EngineeringThis thesis addresses the issues of single microphone based noise estimation technique for speech recognition in noise environments. A lot of researches have been performed on the environmental noise estimation, however most of them require voice activity detector (VAD) for accurate estimation of noise characteristics. I propose two approaches for efficient noise estimation without VAD. The first approach aims at improving the conventional quantile-based noise estimation (QBNE). I fostered the QBNE by adjusting the quantile level (QL) according to the relative amount of added noise to the target speech. Basically, we assign two different QLs, i.e., binary levels, according to the measured statistical moment of log scale power spectrum at each frequency. The second approach is applying dual mixture parametric model in computing likelihoods of speech and non-speech classes. I used dual Gaussian mixture model (GMM) and Rayleigh mixture model (RMM) for the likelihoods. From the assumption that speech is generally uncorrelated to the environmental noises, the noise power spectrum can be estimated by using each mixture model parameter of speech absence class. I compared the proposed methods with the conventional QBNE and minimum statistics based method on a simple speech recognition task in various signal-to-noise ratio (SNR) levels. Based on the experimental results, the proposed methods are shown to be superior to the conventional methods.ope

    Wavelet q-Fisher Information for Scaling Signal Analysis

    Get PDF
    This article first introduces the concept of wavelet q-Fisher information and then derives a closed-form expression of this quantifier for scaling signals of parameter α. It is shown that this information measure appropriately describes the complexities of scaling signals and provides further analysis flexibility with the parameter q. In the limit of q→1, wavelet q-Fisher information reduces to the standard wavelet Fisher information and for q > 2 it reverses its behavior. Experimental results on synthesized fGn signals validates the level-shift detection capabilities of wavelet q-Fisher information. A comparative study also shows that wavelet q-Fisher information locates structural changes in correlated and anti-correlated fGn signals in a way comparable with standard breakpoint location techniques but at a fraction of the time. Finally, the application of this quantifier to H.263 encoded video signals is presented.Consejo Nacional de Ciencia y TecnologĂ­aFOMIX-COQCY

    Bearing fault diagnosis and degradation analysis based on improved empirical mode decomposition and maximum correlated kurtosis deconvolution

    Get PDF
    Detecting periodic impulse signal (PIS) is the core of bearing fault diagnosis. Earlier fault detected, earlier maintenance actions can be implemented. On the other hand, remaining useful life (RUL) prediction provides important information when the maintenance should be conducted. However, good degradation features are the prerequisite for effective RUL prediction. Therefore, this paper mainly concerns earlier fault detection and degradation feature extraction for bearing. Maximum correlated kurtosis deconvolution (MCKD) can enhance PIS produced by bearing fault. Whereas, it only achieve good effect when bearing has severe fault. On the contrary, PIS produced by bearing weak fault is always masked by heavy noise and cannot be enhanced by MCKD. In order to resolve this problem, a revised empirical mode decomposition (EMD) algorithm is used to denoise bearing fault signal before MCKD processing. In revised EMD algorithm, a new recovering algorithm is used to resolve mode mixing problem existed in traditional EMD and it is superior to ensemble EMD. For degradation analysis, correlated kurtosis (CK) value is used as degradation indicator to reflect health condition of bearing. Except of theory analysis, simulated bearing fault data, injected bearing fault data, real bearing fault data and bearing degradation data are used to verify the proposed method. Simulated bearing fault data is used to explain the existed problems. Then, injected bearing fault data and real bearing fault data are used to demonstrate the effectiveness of proposed method for fault diagnosis. Finally, bearing degradation data is used to verify the degradation feature CK extracted based on proposed method. All these case studies show the effectiveness of proposed fault diagnosis and degradation tracking method

    Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

    Get PDF
    Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance. In this paper, we present an experimental study on these linear filters in a specific speech recognition task, namely the CHiME-4 challenge, which features real recordings in multiple noisy environments. Specifically, the rank-1 MWF is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance. To fulfill the underlying rank-1 assumption, the speech covariance matrix is reconstructed based on eigenvectors or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with alternative multichannel linear filters under the same framework, which involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask estimation. The proposed filter outperforms alternative ones, leading to a 40% relative Word Error Rate (WER) reduction compared with the baseline Weighted Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER reduction compared with the GEV-BAN method. The results also suggest that the speech recognition accuracy correlates more with the Mel-frequency cepstral coefficients (MFCC) feature variance than with the noise reduction or the speech distortion level.Comment: for Computer Speech and Languag
    • 

    corecore