275 research outputs found

    Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

    Get PDF
    Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance. In this paper, we present an experimental study on these linear filters in a specific speech recognition task, namely the CHiME-4 challenge, which features real recordings in multiple noisy environments. Specifically, the rank-1 MWF is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance. To fulfill the underlying rank-1 assumption, the speech covariance matrix is reconstructed based on eigenvectors or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with alternative multichannel linear filters under the same framework, which involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask estimation. The proposed filter outperforms alternative ones, leading to a 40% relative Word Error Rate (WER) reduction compared with the baseline Weighted Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER reduction compared with the GEV-BAN method. The results also suggest that the speech recognition accuracy correlates more with the Mel-frequency cepstral coefficients (MFCC) feature variance than with the noise reduction or the speech distortion level.Comment: for Computer Speech and Languag

    Partial Relaxation Approach: An Eigenvalue-Based DOA Estimator Framework

    Full text link
    In this paper, the partial relaxation approach is introduced and applied to DOA estimation using spectral search. Unlike existing methods like Capon or MUSIC which can be considered as single source approximations of multi-source estimation criteria, the proposed approach accounts for the existence of multiple sources. At each considered direction, the manifold structure of the remaining interfering signals impinging on the sensor array is relaxed, which results in closed form estimates for the interference parameters. The conventional multidimensional optimization problem reduces, thanks to this relaxation, to a simple spectral search. Following this principle, we propose estimators based on the Deterministic Maximum Likelihood, Weighted Subspace Fitting and covariance fitting methods. To calculate the pseudo-spectra efficiently, an iterative rooting scheme based on the rational function approximation is applied to the partial relaxation methods. Simulation results show that the performance of the proposed estimators is superior to the conventional methods especially in the case of low Signal-to-Noise-Ratio and low number of snapshots, irrespectively of any specific structure of the sensor array while maintaining a comparable computational cost as MUSIC.Comment: This work has been submitted to IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    State of the Art in Face Recognition

    Get PDF
    Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state

    Transfer Learning for Speech and Language Processing

    Full text link
    Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

    Text-independent speaker recognition

    Get PDF
    This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect

    Overcoming DoF Limitation in Robust Beamforming: A Penalized Inequality-Constrained Approach

    Full text link
    A well-known challenge in beamforming is how to optimally utilize the degrees of freedom (DoF) of the array to design a robust beamformer, especially when the array DoF is smaller than the number of sources in the environment. In this paper, we leverage the tool of constrained convex optimization and propose a penalized inequality-constrained minimum variance (P-ICMV) beamformer to address this challenge. Specifically, we propose a beamformer with a well-targeted objective function and inequality constraints to achieve the design goals. The constraints on interferences penalize the maximum gain of the beamformer at any interfering directions. This can efficiently mitigate the total interference power regardless of whether the number of interfering sources is less than the array DoF or not. Multiple robust constraints on the target protection and interference suppression can be introduced to increase the robustness of the beamformer against steering vector mismatch. By integrating the noise reduction, interference suppression, and target protection, the proposed formulation can efficiently obtain a robust beamformer design while optimally trade off various design goals. When the array DoF is fewer than the number of interferences, the proposed formulation can effectively align the limited DoF to all of the sources to obtain the best overall interference suppression.  \ To numerically solve this problem, we formulate the P-ICMV beamformer design as a convex second-order cone program (SOCP) and propose a low complexity iterative algorithm based on the alternating direction method of multipliers (ADMM). Three applications are simulated to demonstrate the effectiveness of the proposed beamformer.Comment: submitted to IEEE Transactions on Signal Processin

    Rational invariant subspace approximations with applications

    Get PDF
    Includes bibliographical references.Subspace methods such as MUSIC, Minimum Norm, and ESPRIT have gained considerable attention due to their superior performance in sinusoidal and direction-of-arrival (DOA) estimation, but they are also known to be of high computational cost. In this paper, new fast algorithms for approximating signal and noise subspaces and that do not require exact eigendecomposition are presented. These algorithms approximate the required subspace using rational and power-like methods applied to the direct data or the sample covariance matrix. Several ESPRIT- as well as MUSIC-type methods are developed based on these approximations. A substantial computational saving can be gained comparing with those associated with the eigendecomposition-based methods. These methods are demonstrated to have performance comparable to that of MUSIC yet will require fewer computation to obtain the signal subspace matrix
    • …
    corecore