Search CORE

8 research outputs found

DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

Author: Doclo Simon
Fischer Dörte
Tammen Marvin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/05/2019
Field of study

Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-variance-distortionless-response (MVDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation vector is available. Typical estimation procedures of the correlation matrices and the speech interframe correlation (IFC) vector require an estimate of the speech presence probability (SPP) in each time-frequency bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate a speech mask and a noise mask for each time-frequency bin, using which two different SPP estimates are derived. Aiming at achieving a robust performance, the DNN is trained for various noise types and signal-to-noise ratios. Experimental results show that the multi-frame MVDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based estimator

arXiv.org e-Print Archive

Associations of Migration, Socioeconomic Position and Social Relations With Depressive Symptoms – Analyses of the German National Cohort Baseline Data

Objectives: We analyze whether the prevalence of depressive symptoms differs among various migrant and non-migrant populations in Germany and to what extent these differences can be attributed to socioeconomic position (SEP) and social relations.Methods: The German National Cohort health study (NAKO) is a prospective multicenter cohort study (N = 204,878). Migration background (assessed based on citizenship and country of birth of both participant and parents) was used as independent variable, age, sex, Social Network Index, the availability of emotional support, SEP (relative income position and educational status) and employment status were introduced as covariates and depressive symptoms (PHQ-9) as dependent variable in logistic regression models.Results: Increased odds ratios of depressive symptoms were found in all migrant subgroups compared to non-migrants and varied regarding regions of origins. Elevated odds ratios decreased when SEP and social relations were included. Attenuations varied across migrant subgroups.Conclusion: The gap in depressive symptoms can partly be attributed to SEP and social relations, with variations between migrant subgroups. The integration paradox is likely to contribute to the explanation of the results. Future studies need to consider heterogeneity among migrant subgroups whenever possible

Directory of Open Access Journals

MDC Repository

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Joint Multi-Channel Dereverberation and Noise Reduction Using a Unified Convolutional Beamformer With Sparse Priors

Author: Doclo Simon
Gode Henri
Tammen Marvin
Publication venue
Publication date: 03/06/2021
Field of study

Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech component in the short-time Fourier transform domain compared to the noisy microphone signals. In this paper we generalize the convolutional WPD beamformer by using an lp-norm cost function, introducing an adjustable shape parameter which enables to control the sparsity of the desired speech component. Experiments based on the REVERB challenge dataset show that the proposed method outperforms the conventional convolutional WPD beamformer in terms of objective speech quality metrics.Comment: ITG Conference on Speech Communicatio

arXiv.org e-Print Archive

Joint estimation of RETF vector and power spectral densities for speech enhancement based on alternating least squares

Author: Doclo Simon
Kodrasi Ina
Tammen Marvin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/02/2020
Field of study

The multi-channel Wiener filter (MWF) is a well-known multi-microphone speech enhancement technique, aiming at improving the quality of the recorded speech signals in noisy and reverberant environments. Assuming that reverberation and ambient noise can be modeled as a diffuse sound field and the spatial coherence of the residual noise is known, the MWF requires estimates of the relative early transfer function (RETF) vector of the target speaker as well as the power spectral densities (PSDs) of the target, diffuse and residual noise component. RETF vector and PSD estimation is often decoupled, where one quantity is estimated independently of the other quantity. In this paper, we propose to jointly estimate the RETF vector and all PSDs by minimizing the Frobenius norm of a model-based error matrix using an alternating least squares method. Experimental results using different dynamic acoustic scenarios with a moving speaker show that the proposed method leads to a larger MWF performance than a state-of-the-art method based on covariance whitening

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Alternating Least Squares-Based Joint Estimation of RETFs and PSDs for Multi-Channel Speech Enhancement

Author: Doclo Simon
Kodrasi Ina
Tammen Marvin
Publication venue: Deutsche Gesellschaft für Akustik
Publication date: 01/01/2019
Field of study

Publikationsserver der RWTH Aachen University

Iterative alternating least-aquares approach to jointly estimate the RETFs and the diffuse PSD

Author: Doclo Simon
Kodrasi Ina
Tammen Marvin
Publication venue
Publication date: 22/01/2019
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures

Author: Doclo Simon
Rollwage Christian
Sinha Ragini
Tammen Marvin
Publication venue
Publication date: 27/05/2022
Field of study

Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker. In this paper, we consider a complete time-domain target speaker extraction system consisting of a speaker embedder network and a speaker separator network which are jointly trained in an end-to-end learning process. We propose two different architectures for the speaker separator network which are based on the convolutional augmented transformer (conformer). The first architecture uses stacks of conformer and external feed-forward blocks (Conformer-FFN), while the second architecture uses stacks of temporal convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of 2-speakers show that among the proposed separator networks, the TCN-Conformer significantly improves the target speaker extraction performance compared to the Conformer-FFN and a TCN-based baseline system.Comment: submitted to IWAENC 202

arXiv.org e-Print Archive