1,908 research outputs found
Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition
Far-field speech recognition in noisy and reverberant conditions remains a
challenging problem despite recent deep learning breakthroughs. This problem is
commonly addressed by acquiring a speech signal from multiple microphones and
performing beamforming over them. In this paper, we propose to use a recurrent
neural network with long short-term memory (LSTM) architecture to adaptively
estimate real-time beamforming filter coefficients to cope with non-stationary
environmental noise and dynamic nature of source and microphones positions
which results in a set of timevarying room impulse responses. The LSTM adaptive
beamformer is jointly trained with a deep LSTM acoustic model to predict senone
labels. Further, we use hidden units in the deep LSTM acoustic model to assist
in predicting the beamforming filter coefficients. The proposed system achieves
7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real
evaluation set.Comment: in 2017 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP
A Parametric Replay-Based Framework for Underwater Acoustic Communication Channel Simulation
International audienceThis paper lays the foundation of an underwater acoustic channel simulation methodology that is halfway between parametric modeling and stochastic replay of at-sea measurements of channel impulse responses. The motivation behind this approach is to extend the scope of use of replay-based methods by allowing some parameterization of the channel properties while complying with some level of realism. Based on a relative entropy minimization between the original channel impulse response and the simulated one, the idea is to deliberately distort the original channel statistics in order to meet some specified constraints
Efficient Noise Suppression for Robust Speech Recognition
Electrical EngineeringThis thesis addresses the issues of single microphone based noise estimation technique for speech recognition in noise environments. A lot of researches have been performed on the environmental noise estimation, however most of them require voice activity detector (VAD) for accurate estimation of noise characteristics. I propose two approaches for efficient noise estimation without VAD. The first approach aims at improving the conventional quantile-based noise estimation (QBNE). I fostered the QBNE by adjusting the quantile level (QL) according to the relative amount of added noise to the target speech. Basically, we assign two different QLs, i.e., binary levels, according to the measured statistical moment of log scale power spectrum at each frequency. The second approach is applying dual mixture parametric model in computing likelihoods of speech and non-speech classes. I used dual Gaussian mixture model (GMM) and Rayleigh mixture model (RMM) for the likelihoods. From the assumption that speech is generally uncorrelated to the environmental noises, the noise power spectrum can be estimated by using each mixture model parameter of speech absence class.
I compared the proposed methods with the conventional QBNE and minimum statistics based method on a simple speech recognition task in various signal-to-noise ratio (SNR) levels. Based on the experimental results, the proposed methods are shown to be superior to the conventional methods.ope
Wavelet q-Fisher Information for Scaling Signal Analysis
This article first introduces the concept of wavelet q-Fisher information and then derives a closed-form expression of this quantifier for scaling signals of parameter α. It is shown that this information measure appropriately describes the complexities of scaling signals and provides further analysis flexibility with the parameter q. In the limit of qâ1, wavelet q-Fisher information reduces to the standard wavelet Fisher information and for q > 2 it reverses its behavior. Experimental results on synthesized fGn signals validates the level-shift detection capabilities of wavelet q-Fisher information. A comparative study also shows that wavelet q-Fisher information locates structural changes in correlated and anti-correlated fGn signals in a way comparable with standard breakpoint location techniques but at a fraction of the time. Finally, the application of this quantifier to H.263 encoded video signals is presented.Consejo Nacional de Ciencia y TecnologĂaFOMIX-COQCY
Bearing fault diagnosis and degradation analysis based on improved empirical mode decomposition and maximum correlated kurtosis deconvolution
Detecting periodic impulse signal (PIS) is the core of bearing fault diagnosis. Earlier fault detected, earlier maintenance actions can be implemented. On the other hand, remaining useful life (RUL) prediction provides important information when the maintenance should be conducted. However, good degradation features are the prerequisite for effective RUL prediction. Therefore, this paper mainly concerns earlier fault detection and degradation feature extraction for bearing. Maximum correlated kurtosis deconvolution (MCKD) can enhance PIS produced by bearing fault. Whereas, it only achieve good effect when bearing has severe fault. On the contrary, PIS produced by bearing weak fault is always masked by heavy noise and cannot be enhanced by MCKD. In order to resolve this problem, a revised empirical mode decomposition (EMD) algorithm is used to denoise bearing fault signal before MCKD processing. In revised EMD algorithm, a new recovering algorithm is used to resolve mode mixing problem existed in traditional EMD and it is superior to ensemble EMD. For degradation analysis, correlated kurtosis (CK) value is used as degradation indicator to reflect health condition of bearing. Except of theory analysis, simulated bearing fault data, injected bearing fault data, real bearing fault data and bearing degradation data are used to verify the proposed method. Simulated bearing fault data is used to explain the existed problems. Then, injected bearing fault data and real bearing fault data are used to demonstrate the effectiveness of proposed method for fault diagnosis. Finally, bearing degradation data is used to verify the degradation feature CK extracted based on proposed method. All these case studies show the effectiveness of proposed fault diagnosis and degradation tracking method
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments
Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and
the Generalized Eigenvalue (GEV) beamformer are popular signal processing
techniques which can improve speech recognition performance. In this paper, we
present an experimental study on these linear filters in a specific speech
recognition task, namely the CHiME-4 challenge, which features real recordings
in multiple noisy environments. Specifically, the rank-1 MWF is employed for
noise reduction and a new constant residual noise power constraint is derived
which enhances the recognition performance. To fulfill the underlying rank-1
assumption, the speech covariance matrix is reconstructed based on eigenvectors
or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with
alternative multichannel linear filters under the same framework, which
involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask
estimation. The proposed filter outperforms alternative ones, leading to a 40%
relative Word Error Rate (WER) reduction compared with the baseline Weighted
Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER
reduction compared with the GEV-BAN method. The results also suggest that the
speech recognition accuracy correlates more with the Mel-frequency cepstral
coefficients (MFCC) feature variance than with the noise reduction or the
speech distortion level.Comment: for Computer Speech and Languag
- âŠ