275 research outputs found
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments
Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and
the Generalized Eigenvalue (GEV) beamformer are popular signal processing
techniques which can improve speech recognition performance. In this paper, we
present an experimental study on these linear filters in a specific speech
recognition task, namely the CHiME-4 challenge, which features real recordings
in multiple noisy environments. Specifically, the rank-1 MWF is employed for
noise reduction and a new constant residual noise power constraint is derived
which enhances the recognition performance. To fulfill the underlying rank-1
assumption, the speech covariance matrix is reconstructed based on eigenvectors
or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with
alternative multichannel linear filters under the same framework, which
involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask
estimation. The proposed filter outperforms alternative ones, leading to a 40%
relative Word Error Rate (WER) reduction compared with the baseline Weighted
Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER
reduction compared with the GEV-BAN method. The results also suggest that the
speech recognition accuracy correlates more with the Mel-frequency cepstral
coefficients (MFCC) feature variance than with the noise reduction or the
speech distortion level.Comment: for Computer Speech and Languag
Partial Relaxation Approach: An Eigenvalue-Based DOA Estimator Framework
In this paper, the partial relaxation approach is introduced and applied to
DOA estimation using spectral search. Unlike existing methods like Capon or
MUSIC which can be considered as single source approximations of multi-source
estimation criteria, the proposed approach accounts for the existence of
multiple sources. At each considered direction, the manifold structure of the
remaining interfering signals impinging on the sensor array is relaxed, which
results in closed form estimates for the interference parameters. The
conventional multidimensional optimization problem reduces, thanks to this
relaxation, to a simple spectral search. Following this principle, we propose
estimators based on the Deterministic Maximum Likelihood, Weighted Subspace
Fitting and covariance fitting methods. To calculate the pseudo-spectra
efficiently, an iterative rooting scheme based on the rational function
approximation is applied to the partial relaxation methods. Simulation results
show that the performance of the proposed estimators is superior to the
conventional methods especially in the case of low Signal-to-Noise-Ratio and
low number of snapshots, irrespectively of any specific structure of the sensor
array while maintaining a comparable computational cost as MUSIC.Comment: This work has been submitted to IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
State of the Art in Face Recognition
Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Text-independent speaker recognition
This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect
Overcoming DoF Limitation in Robust Beamforming: A Penalized Inequality-Constrained Approach
A well-known challenge in beamforming is how to optimally utilize the degrees
of freedom (DoF) of the array to design a robust beamformer, especially when
the array DoF is smaller than the number of sources in the environment. In this
paper, we leverage the tool of constrained convex optimization and propose a
penalized inequality-constrained minimum variance (P-ICMV) beamformer to
address this challenge. Specifically, we propose a beamformer with a
well-targeted objective function and inequality constraints to achieve the
design goals. The constraints on interferences penalize the maximum gain of the
beamformer at any interfering directions. This can efficiently mitigate the
total interference power regardless of whether the number of interfering
sources is less than the array DoF or not. Multiple robust constraints on the
target protection and interference suppression can be introduced to increase
the robustness of the beamformer against steering vector mismatch. By
integrating the noise reduction, interference suppression, and target
protection, the proposed formulation can efficiently obtain a robust beamformer
design while optimally trade off various design goals. When the array DoF is
fewer than the number of interferences, the proposed formulation can
effectively align the limited DoF to all of the sources to obtain the best
overall interference suppression. To numerically solve this problem, we
formulate the P-ICMV beamformer design as a convex second-order cone program
(SOCP) and propose a low complexity iterative algorithm based on the
alternating direction method of multipliers (ADMM). Three applications are
simulated to demonstrate the effectiveness of the proposed beamformer.Comment: submitted to IEEE Transactions on Signal Processin
Rational invariant subspace approximations with applications
Includes bibliographical references.Subspace methods such as MUSIC, Minimum Norm, and ESPRIT have gained considerable attention due to their superior performance in sinusoidal and direction-of-arrival (DOA) estimation, but they are also known to be of high computational cost. In this paper, new fast algorithms for approximating signal and noise subspaces and that do not require exact eigendecomposition are presented. These algorithms approximate the required subspace using rational and power-like methods applied to the direct data or the sample covariance matrix. Several ESPRIT- as well as MUSIC-type methods are developed based on these approximations. A substantial computational saving can be gained comparing with those associated with the eigendecomposition-based methods. These methods are demonstrated to have performance comparable to that of MUSIC yet will require fewer computation to obtain the signal subspace matrix
- …