914 research outputs found
Multiple and single snapshot compressive beamforming
For a sound field observed on a sensor array, compressive sensing (CS)
reconstructs the direction-of-arrival (DOA) of multiple sources using a
sparsity constraint. The DOA estimation is posed as an underdetermined problem
by expressing the acoustic pressure at each sensor as a phase-lagged
superposition of source amplitudes at all hypothetical DOAs. Regularizing with
an -norm constraint renders the problem solvable with convex
optimization, and promoting sparsity gives high-resolution DOA maps. Here, the
sparse source distribution is derived using maximum a posteriori (MAP)
estimates for both single and multiple snapshots. CS does not require inversion
of the data covariance matrix and thus works well even for a single snapshot
where it gives higher resolution than conventional beamforming. For multiple
snapshots, CS outperforms conventional high-resolution methods, even with
coherent arrivals and at low signal-to-noise ratio. The superior resolution of
CS is demonstrated with vertical array data from the SWellEx96 experiment for
coherent multi-paths.Comment: In press Journal of Acoustical Society of Americ
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments
Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and
the Generalized Eigenvalue (GEV) beamformer are popular signal processing
techniques which can improve speech recognition performance. In this paper, we
present an experimental study on these linear filters in a specific speech
recognition task, namely the CHiME-4 challenge, which features real recordings
in multiple noisy environments. Specifically, the rank-1 MWF is employed for
noise reduction and a new constant residual noise power constraint is derived
which enhances the recognition performance. To fulfill the underlying rank-1
assumption, the speech covariance matrix is reconstructed based on eigenvectors
or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with
alternative multichannel linear filters under the same framework, which
involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask
estimation. The proposed filter outperforms alternative ones, leading to a 40%
relative Word Error Rate (WER) reduction compared with the baseline Weighted
Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER
reduction compared with the GEV-BAN method. The results also suggest that the
speech recognition accuracy correlates more with the Mel-frequency cepstral
coefficients (MFCC) feature variance than with the noise reduction or the
speech distortion level.Comment: for Computer Speech and Languag
Quadratically Constrained Beamforming Robust Against Direction-of-Arrival Mismatch
It is well known that the performance of the minimum variance distortionless response (MVDR) beamformer is very sensitive to steering vector mismatch. Such mismatches can occur as a result of direction-of-arrival (DOA) errors, local scattering, near-far spatial signature mismatch, waveform distortion, source spreading, imperfectly calibrated arrays and distorted antenna shape. In this paper, an adaptive beamformer that is robust against the DOA mismatch is proposed. This method imposes two quadratic constraints such that the magnitude responses of two steering vectors exceed unity. Then, a diagonal loading method is used to force the magnitude responses at the arrival angles between these two steering vectors to exceed unity. Therefore, this method can always force the gains at a desired range of angles to exceed a constant level while suppressing the interferences and noise. A closed-form solution to the proposed minimization problem is introduced, and the diagonal loading factor can be computed systematically by a proposed algorithm. Numerical examples show that this method has excellent signal-to-interference-plus-noise ratio performance and a complexity comparable to the standard MVDR beamformer
Parametric high resolution techniques for radio astronomical imaging
The increased sensitivity of future radio telescopes will result in
requirements for higher dynamic range within the image as well as better
resolution and immunity to interference. In this paper we propose a new matrix
formulation of the imaging equation in the cases of non co-planar arrays and
polarimetric measurements. Then we improve our parametric imaging techniques in
terms of resolution and estimation accuracy. This is done by enhancing both the
MVDR parametric imaging, introducing alternative dirty images and by
introducing better power estimates based on least squares, with positive
semi-definite constraints. We also discuss the use of robust Capon beamforming
and semi-definite programming for solving the self-calibration problem.
Additionally we provide statistical analysis of the bias of the MVDR beamformer
for the case of moving array, which serves as a first step in analyzing
iterative approaches such as CLEAN and the techniques proposed in this paper.
Finally we demonstrate a full deconvolution process based on the parametric
imaging techniques and show its improved resolution and sensitivity compared to
the CLEAN method.Comment: To appear in IEEE Journal of Selected Topics in Signal Processing,
Special issue on Signal Processing for Astronomy and space research. 30 page
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
- …