Search CORE

1,908 research outputs found

Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

Author: Erdogan Hakan
Hershey John R.
Meng Zhong
Watanabe Shinji
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/11/2017
Field of study

Far-field speech recognition in noisy and reverberant conditions remains a challenging problem despite recent deep learning breakthroughs. This problem is commonly addressed by acquiring a speech signal from multiple microphones and performing beamforming over them. In this paper, we propose to use a recurrent neural network with long short-term memory (LSTM) architecture to adaptively estimate real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions which results in a set of timevarying room impulse responses. The LSTM adaptive beamformer is jointly trained with a deep LSTM acoustic model to predict senone labels. Further, we use hidden units in the deep LSTM acoustic model to assist in predicting the beamforming filter coefficients. The proposed system achieves 7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real evaluation set.Comment: in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

arXiv.org e-Print Archive

Crossref

A Parametric Replay-Based Framework for Underwater Acoustic Communication Channel Simulation

Author: Laot Christophe
Passerieux Jean-Michel
Socheleau François-Xavier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/09/2014
Field of study

International audienceThis paper lays the foundation of an underwater acoustic channel simulation methodology that is halfway between parametric modeling and stochastic replay of at-sea measurements of channel impulse responses. The motivation behind this approach is to extend the scope of use of replay-based methods by allowing some parameterization of the channel properties while complying with some level of realism. Based on a relative entropy minimization between the original channel impulse response and the simulated one, the idea is to deliberately distort the original channel statistics in order to meet some specified constraints

HAL-Université de Bretagne Occidentale

Features for voice activity detection: a comparative analysis

Author: Gerhard Schmidt
Markus Buck
Simon Graf
Tobias Herbig
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Efficient Noise Suppression for Robust Speech Recognition

Author: Lee Kangyeoul
Publication venue: Graduate School of UNIST
Publication date: 01/02/2013
Field of study

Electrical EngineeringThis thesis addresses the issues of single microphone based noise estimation technique for speech recognition in noise environments. A lot of researches have been performed on the environmental noise estimation, however most of them require voice activity detector (VAD) for accurate estimation of noise characteristics. I propose two approaches for efficient noise estimation without VAD. The first approach aims at improving the conventional quantile-based noise estimation (QBNE). I fostered the QBNE by adjusting the quantile level (QL) according to the relative amount of added noise to the target speech. Basically, we assign two different QLs, i.e., binary levels, according to the measured statistical moment of log scale power spectrum at each frequency. The second approach is applying dual mixture parametric model in computing likelihoods of speech and non-speech classes. I used dual Gaussian mixture model (GMM) and Rayleigh mixture model (RMM) for the likelihoods. From the assumption that speech is generally uncorrelated to the environmental noises, the noise power spectrum can be estimated by using each mixture model parameter of speech absence class. I compared the proposed methods with the conventional QBNE and minimum statistics based method on a simple speech recognition task in various signal-to-noise ratio (SNR) levels. Based on the experimental results, the proposed methods are shown to be superior to the conventional methods.ope

ScholarWorks@UNIST

Wavelet q-Fisher Information for Scaling Signal Analysis

Author: Argaez-Xool Jesús
Manzano-Pinzón Franciso
Ramírez-Pacheco Julio
Rizo-Domínguez Luis
Torres-Román Deni
Trejo-Sánchez Joel
Publication venue: 'MDPI AG'
Publication date: 01/08/2012
Field of study

This article first introduces the concept of wavelet q-Fisher information and then derives a closed-form expression of this quantifier for scaling signals of parameter α. It is shown that this information measure appropriately describes the complexities of scaling signals and provides further analysis flexibility with the parameter q. In the limit of q→1, wavelet q-Fisher information reduces to the standard wavelet Fisher information and for q > 2 it reverses its behavior. Experimental results on synthesized fGn signals validates the level-shift detection capabilities of wavelet q-Fisher information. A comparative study also shows that wavelet q-Fisher information locates structural changes in correlated and anti-correlated fGn signals in a way comparable with standard breakpoint location techniques but at a fraction of the time. Finally, the application of this quantifier to H.263 encoded video signals is presented.Consejo Nacional de Ciencia y TecnologíaFOMIX-COQCY

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Repositorio Institucional del ITESO

Comparisons between computer simulations of room acoustical parameters and those measured in concert halls

Author: Christensen Claus Lynge
Gade Anders Christian
Rindel Jens Holger
Shiokawa Hiroyoshi
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1999
Field of study

Crossref

Online Research Database In Technology

Bearing fault diagnosis and degradation analysis based on improved empirical mode decomposition and maximum correlated kurtosis deconvolution

Author: Jianmin Zhao
Jianshe Kang
Lishan Hao
Liying Cai
Xinghui Zhang
Publication venue: 'JVE International Ltd.'
Publication date: 15/02/2015
Field of study

Detecting periodic impulse signal (PIS) is the core of bearing fault diagnosis. Earlier fault detected, earlier maintenance actions can be implemented. On the other hand, remaining useful life (RUL) prediction provides important information when the maintenance should be conducted. However, good degradation features are the prerequisite for effective RUL prediction. Therefore, this paper mainly concerns earlier fault detection and degradation feature extraction for bearing. Maximum correlated kurtosis deconvolution (MCKD) can enhance PIS produced by bearing fault. Whereas, it only achieve good effect when bearing has severe fault. On the contrary, PIS produced by bearing weak fault is always masked by heavy noise and cannot be enhanced by MCKD. In order to resolve this problem, a revised empirical mode decomposition (EMD) algorithm is used to denoise bearing fault signal before MCKD processing. In revised EMD algorithm, a new recovering algorithm is used to resolve mode mixing problem existed in traditional EMD and it is superior to ensemble EMD. For degradation analysis, correlated kurtosis (CK) value is used as degradation indicator to reflect health condition of bearing. Except of theory analysis, simulated bearing fault data, injected bearing fault data, real bearing fault data and bearing degradation data are used to verify the proposed method. Simulated bearing fault data is used to explain the existed problems. Then, injected bearing fault data and real bearing fault data are used to demonstrate the effectiveness of proposed method for fault diagnosis. Finally, bearing degradation data is used to verify the degradation feature CK extracted based on proposed method. All these case studies show the effectiveness of proposed fault diagnosis and degradation tracking method

Journal of Vibroengineering

JVE International

Journal of Mechatronics and Artificial Intelligence in Engineering

Journal of Mechanical Engineering, Automation and Control Systems

Enhanced bearing fault diagnosis using integral envelope spectrum from spectral coherence normalized with feature energy

Author: Chen Bingyan
Cheng Yao
Gu Fengshou
Zhang Weihua
Publication venue: 'Elsevier BV'
Publication date: 15/02/2022
Field of study

Huddersfield Research Portal

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

Author: Serizel Romain
Vincent Emmanuel
Wang Ziteng
Yan Yonghong
Publication venue
Publication date: 14/11/2017
Field of study

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance. In this paper, we present an experimental study on these linear filters in a specific speech recognition task, namely the CHiME-4 challenge, which features real recordings in multiple noisy environments. Specifically, the rank-1 MWF is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance. To fulfill the underlying rank-1 assumption, the speech covariance matrix is reconstructed based on eigenvectors or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with alternative multichannel linear filters under the same framework, which involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask estimation. The proposed filter outperforms alternative ones, leading to a 40% relative Word Error Rate (WER) reduction compared with the baseline Weighted Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER reduction compared with the GEV-BAN method. The results also suggest that the speech recognition accuracy correlates more with the Mel-frequency cepstral coefficients (MFCC) feature variance than with the noise reduction or the speech distortion level.Comment: for Computer Speech and Languag

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server