914 research outputs found

    Multiple and single snapshot compressive beamforming

    Full text link
    For a sound field observed on a sensor array, compressive sensing (CS) reconstructs the direction-of-arrival (DOA) of multiple sources using a sparsity constraint. The DOA estimation is posed as an underdetermined problem by expressing the acoustic pressure at each sensor as a phase-lagged superposition of source amplitudes at all hypothetical DOAs. Regularizing with an â„“1\ell_1-norm constraint renders the problem solvable with convex optimization, and promoting sparsity gives high-resolution DOA maps. Here, the sparse source distribution is derived using maximum a posteriori (MAP) estimates for both single and multiple snapshots. CS does not require inversion of the data covariance matrix and thus works well even for a single snapshot where it gives higher resolution than conventional beamforming. For multiple snapshots, CS outperforms conventional high-resolution methods, even with coherent arrivals and at low signal-to-noise ratio. The superior resolution of CS is demonstrated with vertical array data from the SWellEx96 experiment for coherent multi-paths.Comment: In press Journal of Acoustical Society of Americ

    Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

    Get PDF
    Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance. In this paper, we present an experimental study on these linear filters in a specific speech recognition task, namely the CHiME-4 challenge, which features real recordings in multiple noisy environments. Specifically, the rank-1 MWF is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance. To fulfill the underlying rank-1 assumption, the speech covariance matrix is reconstructed based on eigenvectors or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with alternative multichannel linear filters under the same framework, which involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask estimation. The proposed filter outperforms alternative ones, leading to a 40% relative Word Error Rate (WER) reduction compared with the baseline Weighted Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER reduction compared with the GEV-BAN method. The results also suggest that the speech recognition accuracy correlates more with the Mel-frequency cepstral coefficients (MFCC) feature variance than with the noise reduction or the speech distortion level.Comment: for Computer Speech and Languag

    Quadratically Constrained Beamforming Robust Against Direction-of-Arrival Mismatch

    Get PDF
    It is well known that the performance of the minimum variance distortionless response (MVDR) beamformer is very sensitive to steering vector mismatch. Such mismatches can occur as a result of direction-of-arrival (DOA) errors, local scattering, near-far spatial signature mismatch, waveform distortion, source spreading, imperfectly calibrated arrays and distorted antenna shape. In this paper, an adaptive beamformer that is robust against the DOA mismatch is proposed. This method imposes two quadratic constraints such that the magnitude responses of two steering vectors exceed unity. Then, a diagonal loading method is used to force the magnitude responses at the arrival angles between these two steering vectors to exceed unity. Therefore, this method can always force the gains at a desired range of angles to exceed a constant level while suppressing the interferences and noise. A closed-form solution to the proposed minimization problem is introduced, and the diagonal loading factor can be computed systematically by a proposed algorithm. Numerical examples show that this method has excellent signal-to-interference-plus-noise ratio performance and a complexity comparable to the standard MVDR beamformer

    Parametric high resolution techniques for radio astronomical imaging

    Full text link
    The increased sensitivity of future radio telescopes will result in requirements for higher dynamic range within the image as well as better resolution and immunity to interference. In this paper we propose a new matrix formulation of the imaging equation in the cases of non co-planar arrays and polarimetric measurements. Then we improve our parametric imaging techniques in terms of resolution and estimation accuracy. This is done by enhancing both the MVDR parametric imaging, introducing alternative dirty images and by introducing better power estimates based on least squares, with positive semi-definite constraints. We also discuss the use of robust Capon beamforming and semi-definite programming for solving the self-calibration problem. Additionally we provide statistical analysis of the bias of the MVDR beamformer for the case of moving array, which serves as a first step in analyzing iterative approaches such as CLEAN and the techniques proposed in this paper. Finally we demonstrate a full deconvolution process based on the parametric imaging techniques and show its improved resolution and sensitivity compared to the CLEAN method.Comment: To appear in IEEE Journal of Selected Topics in Signal Processing, Special issue on Signal Processing for Astronomy and space research. 30 page

    Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

    Get PDF
    This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusion
    • …
    corecore