99 research outputs found

    DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

    Full text link
    Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-variance-distortionless-response (MVDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation vector is available. Typical estimation procedures of the correlation matrices and the speech interframe correlation (IFC) vector require an estimate of the speech presence probability (SPP) in each time-frequency bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate a speech mask and a noise mask for each time-frequency bin, using which two different SPP estimates are derived. Aiming at achieving a robust performance, the DNN is trained for various noise types and signal-to-noise ratios. Experimental results show that the multi-frame MVDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based estimator

    Comparison of Binaural RTF-Vector-Based Direction of Arrival Estimation Methods Exploiting an External Microphone

    Full text link
    In this paper we consider a binaural hearing aid setup, where in addition to the head-mounted microphones an external microphone is available. For this setup, we investigate the performance of several relative transfer function (RTF) vector estimation methods to estimate the direction of arrival(DOA) of the target speaker in a noisy and reverberant acoustic environment. More in particular, we consider the state-of-the-art covariance whitening (CW) and covariance subtraction (CS) methods, either incorporating the external microphone or not, and the recently proposed spatial coherence (SC) method, requiring the external microphone. To estimate the DOA from the estimated RTF vector, we propose to minimize the frequency-averaged Hermitian angle between the estimated head-mounted RTF vector and a database of prototype head-mounted RTF vectors. Experimental results with stationary and moving speech sources in a reverberant environment with diffuse-like noise show that the SC method outperforms the CS method and yields a similar DOA estimation accuracy as the CW method at a lower computational complexity.Comment: Submitted to EUSIPCO 202

    Covariance Blocking and Whitening Method for Successive Relative Transfer Function Vector Estimation in Multi-Speaker Scenarios

    Full text link
    This paper addresses the challenge of estimating the relative transfer function (RTF) vectors of multiple speakers in a noisy and reverberant environment. More specifically, we consider a scenario where two speakers activate successively. In this scenario, the RTF vector of the first speaker can be estimated in a straightforward way and the main challenge lies in estimating the RTF vector of the second speaker during segments where both speakers are simultaneously active. To estimate the RTF vector of the second speaker the so-called blind oblique projection (BOP) method determines the oblique projection operator that optimally blocks the second speaker. Instead of blocking the second speaker, in this paper we propose a covariance blocking and whitening (CBW) method, which first blocks the first speaker and applies whitening using the estimated noise covariance matrix and then estimates the RTF vector of the second speaker based on a singular value decomposition. When using the estimated RTF vectors of both speakers in a linearly constrained minimum variance beamformer, simulation results using real-world recordings for multiple speaker positions demonstrate that the proposed CBW method outperforms the conventional BOP and covariance whitening methods in terms of signal-to-interferer-and-noise ratio improvement.Comment: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct 22-25, 202

    Relative Transfer Function Vector Estimation for Acoustic Sensor Networks Exploiting Covariance Matrix Structure

    Full text link
    In many multi-microphone algorithms for noise reduction, an estimate of the relative transfer function (RTF) vector of the target speaker is required. The state-of-the-art covariance whitening (CW) method estimates the RTF vector as the principal eigenvector of the whitened noisy covariance matrix, where whitening is performed using an estimate of the noise covariance matrix. In this paper, we consider an acoustic sensor network consisting of multiple microphone nodes. Assuming uncorrelated noise between the nodes but not within the nodes, we propose two RTF vector estimation methods that leverage the block-diagonal structure of the noise covariance matrix. The first method modifies the CW method by considering only the diagonal blocks of the estimated noise covariance matrix. In contrast, the second method only considers the off-diagonal blocks of the noisy covariance matrix, but cannot be solved using a simple eigenvalue decomposition. When applying the estimated RTF vector in a minimum variance distortionless response beamformer, simulation results for real-world recordings in a reverberant environment with multiple noise sources show that the modified CW method performs slightly better than the CW method in terms of SNR improvement, while the off-diagonal selection method outperforms a biased RTF vector estimate obtained as the principal eigenvector of the noisy covariance matrix.Comment: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz NY, USA, Oct. 202

    Geometry-aware DoA Estimation using a Deep Neural Network with mixed-data input features

    Full text link
    Unlike model-based direction of arrival (DoA) estimation algorithms, supervised learning-based DoA estimation algorithms based on deep neural networks (DNNs) are usually trained for one specific microphone array geometry, resulting in poor performance when applied to a different array geometry. In this paper we illustrate the fundamental difference between supervised learning-based and model-based algorithms leading to this sensitivity. Aiming at designing a supervised learning-based DoA estimation algorithm that generalizes well to different array geometries, in this paper we propose a geometry-aware DoA estimation algorithm. The algorithm uses a fully connected DNN and takes mixed data as input features, namely the time lags maximizing the generalized cross-correlation with phase transform and the microphone coordinates, which are assumed to be known. Experimental results for a reverberant scenario demonstrate the flexibility of the proposed algorithm towards different array geometries and show that the proposed algorithm outperforms model-based algorithms such as steered response power with phase transform.Comment: Submitted to ICASSP 202

    Modeling of Speech-dependent Own Voice Transfer Characteristics for Hearables with In-ear Microphones

    Full text link
    Hearables often contain an in-ear microphone, which may be used to capture the own voice of its user. However, due to ear canal occlusion the in-ear microphone mostly records body-conducted speech, which suffers from band-limitation effects and is subject to amplification of low frequency content. These transfer characteristics are assumed to vary both based on speech content and between individual talkers. It is desirable to have an accurate model of the own voice transfer characteristics between hearable microphones. Such a model can be used, e.g., to simulate a large amount of in-ear recordings to train supervised learning-based algorithms aiming at compensating own voice transfer characteristics. In this paper we propose a speech-dependent system identification model based on phoneme recognition. Using recordings from a prototype hearable, the modeling accuracy is evaluated in terms of technical measures. We investigate robustness of transfer characteristic models to utterance or talker mismatch. Simulation results show that using the proposed speech-dependent model is preferable for simulating in-ear recordings compared to a speech-independent model. The proposed model is able to generalize better to new utterances than an adaptive filtering-based model. Additionally, we find that talker-averaged models generalize better to different talkers than individual models.Comment: 18 pages, 11 figures; Extended version of arXiv:2309.08294 (more detailed description of the problem, additional models considered, more systematic evaluation conducted on a different, larger dataset
    • …
    corecore