3,160 research outputs found
Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition
Far-field speech recognition in noisy and reverberant conditions remains a
challenging problem despite recent deep learning breakthroughs. This problem is
commonly addressed by acquiring a speech signal from multiple microphones and
performing beamforming over them. In this paper, we propose to use a recurrent
neural network with long short-term memory (LSTM) architecture to adaptively
estimate real-time beamforming filter coefficients to cope with non-stationary
environmental noise and dynamic nature of source and microphones positions
which results in a set of timevarying room impulse responses. The LSTM adaptive
beamformer is jointly trained with a deep LSTM acoustic model to predict senone
labels. Further, we use hidden units in the deep LSTM acoustic model to assist
in predicting the beamforming filter coefficients. The proposed system achieves
7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real
evaluation set.Comment: in 2017 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP
Symmetric complex-valued RBF receiver for multiple-antenna aided wireless systems
A nonlinear beamforming assisted detector is proposed for multiple-antenna-aided wireless systems employing complex-valued quadrature phase shift-keying modulation. By exploiting the inherent symmetry of the optimal Bayesian detection solution, a novel complex-valued symmetric radial basis function (SRBF)-network-based detector is developed, which is capable of approaching the optimal Bayesian performance using channel-impaired training data. In the uplink case, adaptive nonlinear beamforming can be efficiently implemented by estimating the system’s channel matrix based on the least squares channel estimate. Adaptive implementation of nonlinear beamforming in the downlink case by contrast is much more challenging, and we adopt a cluster-variationenhanced clustering algorithm to directly identify the SRBF center vectors required for realizing the optimal Bayesian detector. A simulation example is included to demonstrate the achievable performance improvement by the proposed adaptive nonlinear beamforming solution over the theoretical linear minimum bit error rate beamforming benchmark
Design of a Novel Antenna Array Beamformer Using Neural Networks Trained by Modified Adaptive Dispersion Invasive Weed Optimization Based Data
A new antenna array beamformer based on neural networks (NNs) is presented. The NN training is performed by using optimized data sets extracted by a novel Invasive Weed Optimization (IWO) variant called Modified Adaptive Dispersion IWO (MADIWO). The trained NN is utilized as an adaptive beamformer that makes a uniform linear antenna array steer the main lobe towards a desired signal, place respective nulls towards several interference signals and suppress the side lobe level (SLL). Initially, the NN structure is selected by training several NNs of various structures using MADIWO based data and by making a comparison among the NNs in terms of training performance. The selected NN structure is then used to construct an adaptive beamformer, which is compared to MADIWO based and ADIWO based beamformers, regarding the SLL as well as the ability to properly steer the main lobe and the nulls. The comparison is made considering several sets of random cases with different numbers of interference signals and different power levels of additive zero-mean Gaussian noise. The comparative results exhibit the advantages of the proposed beamformer
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …