32 research outputs found
Covariance Blocking and Whitening Method for Successive Relative Transfer Function Vector Estimation in Multi-Speaker Scenarios
This paper addresses the challenge of estimating the relative transfer
function (RTF) vectors of multiple speakers in a noisy and reverberant
environment. More specifically, we consider a scenario where two speakers
activate successively. In this scenario, the RTF vector of the first speaker
can be estimated in a straightforward way and the main challenge lies in
estimating the RTF vector of the second speaker during segments where both
speakers are simultaneously active. To estimate the RTF vector of the second
speaker the so-called blind oblique projection (BOP) method determines the
oblique projection operator that optimally blocks the second speaker. Instead
of blocking the second speaker, in this paper we propose a covariance blocking
and whitening (CBW) method, which first blocks the first speaker and applies
whitening using the estimated noise covariance matrix and then estimates the
RTF vector of the second speaker based on a singular value decomposition. When
using the estimated RTF vectors of both speakers in a linearly constrained
minimum variance beamformer, simulation results using real-world recordings for
multiple speaker positions demonstrate that the proposed CBW method outperforms
the conventional BOP and covariance whitening methods in terms of
signal-to-interferer-and-noise ratio improvement.Comment: IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics (WASPAA), New Paltz, NY, USA, Oct 22-25, 202
Two-Channel Speech Enhancement and Implementation Considerations: Noise Reduction and Speech Quality
Overcoming DoF Limitation in Robust Beamforming: A Penalized Inequality-Constrained Approach
A well-known challenge in beamforming is how to optimally utilize the degrees
of freedom (DoF) of the array to design a robust beamformer, especially when
the array DoF is smaller than the number of sources in the environment. In this
paper, we leverage the tool of constrained convex optimization and propose a
penalized inequality-constrained minimum variance (P-ICMV) beamformer to
address this challenge. Specifically, we propose a beamformer with a
well-targeted objective function and inequality constraints to achieve the
design goals. The constraints on interferences penalize the maximum gain of the
beamformer at any interfering directions. This can efficiently mitigate the
total interference power regardless of whether the number of interfering
sources is less than the array DoF or not. Multiple robust constraints on the
target protection and interference suppression can be introduced to increase
the robustness of the beamformer against steering vector mismatch. By
integrating the noise reduction, interference suppression, and target
protection, the proposed formulation can efficiently obtain a robust beamformer
design while optimally trade off various design goals. When the array DoF is
fewer than the number of interferences, the proposed formulation can
effectively align the limited DoF to all of the sources to obtain the best
overall interference suppression. To numerically solve this problem, we
formulate the P-ICMV beamformer design as a convex second-order cone program
(SOCP) and propose a low complexity iterative algorithm based on the
alternating direction method of multipliers (ADMM). Three applications are
simulated to demonstrate the effectiveness of the proposed beamformer.Comment: submitted to IEEE Transactions on Signal Processin
Broadband adaptive beamforming with low complexity and frequency invariant response
This thesis proposes different methods to reduce the computational complexity as well as increasing the adaptation rate of adaptive broadband beamformers. This is performed exemplarily for the generalised sidelobe canceller (GSC) structure. The GSC is an alternative implementation of the linearly constrained minimum variance beamformer, which can utilise well-known adaptive filtering algorithms, such as the least mean square (LMS) or the recursive least squares (RLS) to perform unconstrained adaptive optimisation.A direct DFT implementation, by which broadband signals are decomposed into frequency bins and processed by independent narrowband beamforming algorithms, is thought to be computationally optimum. However, this setup fail to converge to the time domain minimum mean square error (MMSE) if signal components are not aligned to frequency bins, resulting in a large worst case error. To mitigate this problem of the so-called independent frequency bin (IFB) processor, overlap-save based GSC beamforming structures have been explored. This system address the minimisation of the time domain MMSE, with a significant reduction in computational complexity when compared to time-domain implementations, and show a better convergence behaviour than the IFB beamformer. By studying the effects that the blocking matrix has on the adaptive process for the overlap-save beamformer, several modifications are carried out to enhance both the simplicity of the algorithm as well as its convergence speed. These modifications result in the GSC beamformer utilising a significantly lower computational complexity compare to the time domain approach while offering similar convergence characteristics.In certain applications, especially in the areas of acoustics, there is a need to maintain constant resolution across a wide operating spectrum that may extend across several octaves. To attain constant beamwidth is difficult, particularly if uniformly spaced linear sensor array are employed for beamforming, since spatial resolution is reciprocally proportional to both the array aperture and the frequency. A scaled aperture arrangement is introduced for the subband based GSC beamformer to achieve near uniform resolution across a wide spectrum, whereby an octave-invariant design is achieved. This structure can also be operated in conjunction with adaptive beamforming algorithms. Frequency dependent tapering of the sensor signals is proposed in combination with the overlap-save GSC structure in order to achieve an overall frequency-invariant characteristic. An adaptive version is proposed for frequency-invariant overlap-save GSC beamformer. Broadband adaptive beamforming algorithms based on the family of least mean squares (LMS) algorithms are known to exhibit slow convergence if the input signal is correlated. To improve the convergence of the GSC when based on LMS-type algorithms, we propose the use of a broadband eigenvalue decomposition (BEVD) to decorrelate the input of the adaptive algorithm in the spatial dimension, for which an increase in convergence speed can be demonstrated over other decorrelating measures, such as the Karhunen-Loeve transform. In order to address the remaining temporal correlation after BEVD processing, this approach is combined with subband decomposition through the use of oversampled filter banks. The resulting spatially and temporally decorrelated GSC beamformer provides further enhanced convergence speed over spatial or temporal decorrelation methods on their own
Robust acoustic beamforming in the presence of channel propagation uncertainties
Beamforming is a popular multichannel signal processing technique used in conjunction with microphone arrays to spatially filter a sound field. Conventional optimal beamformers assume that the propagation channels between each source and microphone pair are a deterministic function of the source and microphone geometry. However in real acoustic environments, there are several mechanisms that give rise to unpredictable variations in the phase and amplitudes of the propagation channels. In the presence of these uncertainties the performance of beamformers degrade. Robust beamformers are designed to reduce this performance degradation. However, robust beamformers rely on tuning parameters that are not closely related to the array geometry.
By modeling the uncertainty in the acoustic channels explicitly we can derive more accurate expressions for the source-microphone channel variability. As such we are able to derive beamformers that are well suited to the application of acoustics in realistic environments. Through experiments we validate the acoustic channel models and through simulations we show the performance gains of the associated robust beamformer.
Furthermore, by modeling the speech short time Fourier transform coefficients we are able to design a beamformer framework in the power domain. By utilising spectral subtraction we are able to see performance benefits over ideal conventional beamformers. Including the channel uncertainties models into the weights design improves robustness.Open Acces
Real-time Microphone Array Processing for Sound-field Analysis and Perceptually Motivated Reproduction
This thesis details real-time implementations of sound-field analysis and perceptually motivated reproduction methods for visualisation and auralisation purposes. For the former, various methods for visualising the relative distribution of sound energy from one point in space are investigated and contrasted; including a novel reformulation of the cross-pattern coherence (CroPaC) algorithm, which integrates a new side-lobe suppression technique. Whereas for auralisation applications, listening tests were conducted to compare ambisonics reproduction with a novel headphone formulation of the directional audio coding (DirAC) method. The results indicate that the side-lobe suppressed CroPaC method offers greater spatial selectivity in reverberant conditions compared with other popular approaches, and that the new DirAC formulation yields higher perceived spatial accuracy when compared to the ambisonics method
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.