8 research outputs found

    DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

    Full text link
    Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-variance-distortionless-response (MVDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation vector is available. Typical estimation procedures of the correlation matrices and the speech interframe correlation (IFC) vector require an estimate of the speech presence probability (SPP) in each time-frequency bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate a speech mask and a noise mask for each time-frequency bin, using which two different SPP estimates are derived. Aiming at achieving a robust performance, the DNN is trained for various noise types and signal-to-noise ratios. Experimental results show that the multi-frame MVDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based estimator

    Associations of Migration, Socioeconomic Position and Social Relations With Depressive Symptoms – Analyses of the German National Cohort Baseline Data

    Get PDF
    Objectives: We analyze whether the prevalence of depressive symptoms differs among various migrant and non-migrant populations in Germany and to what extent these differences can be attributed to socioeconomic position (SEP) and social relations.Methods: The German National Cohort health study (NAKO) is a prospective multicenter cohort study (N = 204,878). Migration background (assessed based on citizenship and country of birth of both participant and parents) was used as independent variable, age, sex, Social Network Index, the availability of emotional support, SEP (relative income position and educational status) and employment status were introduced as covariates and depressive symptoms (PHQ-9) as dependent variable in logistic regression models.Results: Increased odds ratios of depressive symptoms were found in all migrant subgroups compared to non-migrants and varied regarding regions of origins. Elevated odds ratios decreased when SEP and social relations were included. Attenuations varied across migrant subgroups.Conclusion: The gap in depressive symptoms can partly be attributed to SEP and social relations, with variations between migrant subgroups. The integration paradox is likely to contribute to the explanation of the results. Future studies need to consider heterogeneity among migrant subgroups whenever possible

    Joint Multi-Channel Dereverberation and Noise Reduction Using a Unified Convolutional Beamformer With Sparse Priors

    Full text link
    Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech component in the short-time Fourier transform domain compared to the noisy microphone signals. In this paper we generalize the convolutional WPD beamformer by using an lp-norm cost function, introducing an adjustable shape parameter which enables to control the sparsity of the desired speech component. Experiments based on the REVERB challenge dataset show that the proposed method outperforms the conventional convolutional WPD beamformer in terms of objective speech quality metrics.Comment: ITG Conference on Speech Communicatio

    Joint estimation of RETF vector and power spectral densities for speech enhancement based on alternating least squares

    No full text
    The multi-channel Wiener filter (MWF) is a well-known multi-microphone speech enhancement technique, aiming at improving the quality of the recorded speech signals in noisy and reverberant environments. Assuming that reverberation and ambient noise can be modeled as a diffuse sound field and the spatial coherence of the residual noise is known, the MWF requires estimates of the relative early transfer function (RETF) vector of the target speaker as well as the power spectral densities (PSDs) of the target, diffuse and residual noise component. RETF vector and PSD estimation is often decoupled, where one quantity is estimated independently of the other quantity. In this paper, we propose to jointly estimate the RETF vector and all PSDs by minimizing the Frobenius norm of a model-based error matrix using an alternating least squares method. Experimental results using different dynamic acoustic scenarios with a moving speaker show that the proposed method leads to a larger MWF performance than a state-of-the-art method based on covariance whitening

    Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures

    Full text link
    Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of 2-speakers show that among the proposed separator networks, the TCN-Conformer significantly improves the target speaker extraction performance compared to the Conformer-FFN and a TCN-based baseline system.Comment: submitted to IWAENC 202
    corecore