35 research outputs found

    Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation

    Get PDF
    This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish Ministry of Education through the National Program FPU under Grant FPU15/0416

    Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise

    Get PDF
    We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise. In real scenarios, these distortion sources may occur simultaneously and reducing them implies combining the corresponding distortion-specific filters. As these filters interact with each other, they must be jointly optimized. We propose to model the target and residual signals after linear echo cancellation and dereverberation using a multichannel Gaussian modeling framework and to jointly represent their spectra by means of a neural network. We develop an iterative block-coordinate ascent algorithm to update all the filters. We evaluate our system on real recordings of acoustic echo, reverberation and noise acquired with a smart speaker in various situations. The proposed approach outperforms in terms of overall distortion a cascade of the individual approaches and a joint reduction approach which does not rely on a spectral model of the target and residual signals

    BeitrÀge zu breitbandigen Freisprechsystemen und ihrer Evaluation

    Get PDF
    This work deals with the advancement of wideband hands-free systems (HFS’s) for mono- and stereophonic cases of application. Furthermore, innovative contributions to the corr. field of quality evaluation are made. The proposed HFS approaches are based on frequency-domain adaptive filtering for system identification, making use of Kalman theory and state-space modeling. Functional enhancement modules are developed in this work, which improve one or more of key quality aspects, aiming at not to harm others. In so doing, these modules can be combined in a flexible way, dependent on the needs at hand. The enhanced monophonic HFS is evaluated according to automotive ITU-T recommendations, to prove its customized efficacy. Furthermore, a novel methodology and techn. framework are introduced in this work to improve the prototyping and evaluation process of automotive HF and in-car-communication (ICC) systems. The monophonic HFS in several configurations hereby acts as device under test (DUT) and is thoroughly investigated, which will show the DUT’s satisfying performance, as well as the advantages of the proposed development process. As current methods for the evaluation of HFS’s in dynamic conditions oftentimes still lack flexibility, reproducibility, and accuracy, this work introduces “Car in a Box” (CiaB) as a novel, improved system for this demanding task. It is able to enhance the development process by performing high-resolution system identification of dynamic electro-acoustical systems. The extracted dyn. impulse response trajectories are then applicable to arbitrary input signals in a synthesis operation. A realistic dynamic automotive auralization of a car cabin interior is available for HFS evaluation. It is shown that this system improves evaluation flexibility at guaranteed reproducibility. In addition, the accuracy of evaluation methods can be increased by having access to exact, realistic imp. resp. trajectories acting as a so-called “ground truth” reference. If CiaB is included into an automotive evaluation setup, there is no need for an acoustical car interior prototype to be present at this stage of development. Hency, CiaB may ease the HFS development process. Dynamic acoustic replicas may be provided including an arbitrary number of acoustic car cabin interiors for multiple developers simultaneously. With CiaB, speech enh. system developers therefore have an evaluation environment at hand, which can adequately replace the real environment.Diese Arbeit beschĂ€ftigt sich mit der Weiterentwicklung breitbandiger Freisprechsysteme fĂŒr mono-/stereophone AnwendungsfĂ€lle und liefert innovative BeitrĂ€ge zu deren QualitĂ€tsmessung. Die vorgestellten Verfahren basieren auf im Frequenzbereich adaptierenden Algorithmen zur Systemidentifikation gemĂ€ĂŸ Kalman-Theorie in einer Zustandsraumdarstellung. Es werden funktionale Erweiterungsmodule dahingehend entwickelt, dass mindestens eine QualitĂ€tsanforderung verbessert wird, ohne andere eklatant zu verletzen. Diese nach Anforderung flexibel kombinierbaren algorithmischen Erweiterungen werden gemĂ€ĂŸ Empfehlungen der ITU-T (Rec. P.1110/P.1130) in vorwiegend automotiven Testszenarien getestet und somit deren zielgerichtete Wirksamkeit bestĂ€tigt. Es wird eine Methodensammlung und ein technisches System zur verbesserten Prototypentwicklung/Evaluation von automotiven Freisprech- und Innenraumkommunikationssystemen vorgestellt und beispielhaft mit dem monophonen Freisprechsystem in diversen Ausbaustufen zur Anwendung gebracht. Daraus entstehende Vorteile im Entwicklungs- und Testprozess von Sprachverbesserungssystem werden dargelegt und messtechnisch verifiziert. Bestehende Messverfahren zum Verhalten von Freisprechsystemen in zeitvarianten Umgebungen zeigten bisher oft nur ein unzureichendes Maß an FlexibilitĂ€t, Reproduzierbarkeit und Genauigkeit. Daher wird hier das „Car in a Box“-Verfahren (CiaB) entwickelt und vorgestellt, mit dem zeitvariante elektro-akustische Systeme technisch identifiziert werden können. So gewonnene dynamische Impulsantworten können im Labor in einer Syntheseoperation auf beliebige Eingangsignale angewandt werden, um realistische Testsignale unter dyn. Bedingungen zu erzeugen. Bei diesem Vorgehen wird ein hohes Maß an FlexibilitĂ€t bei garantierter Reproduzierbarkeit erlangt. Es wird gezeigt, dass die Genauigkeit von darauf basierenden Evaluationsverfahren zudem gesteigert werden kann, da mit dem Vorliegen von exakten, realen Impulsantworten zu jedem Zeitpunkt der Messung eine sogenannte „ground truth“ als Referenz zur VerfĂŒgung steht. Bei der Einbindung von CiaB in einen Messaufbau fĂŒr automotive Freisprechsysteme ist es bedeutsam, dass zu diesem Zeitpunkt das eigentliche Fahrzeug nicht mehr benötigt wird. Es wird gezeigt, dass eine dyn. Fahrzeugakustikumgebung, wie sie im Entwicklungsprozess von automotiven Sprachverbesserungsalgorithmen benötigt wird, in beliebiger Anzahl vollstĂ€ndig und mind. gleichwertig durch CiaB ersetzt werden kann

    On the application of minimum noise tracking to cancel cosine shaped residual noise

    Get PDF
    It has been shown recently that for coherence based dual microphone array speech enhancement systems, cross-spectral subtraction is an efficient technique aimed to reduce the correlated noise components. The zero-phase filtering criterion employed in these methods is derived from the standard coherence function that is modified to incorporate the noise cross power spectrum between the two channels. However, there has been limited success at applying coherence based filters when speech processing is carried out under relatively harsh acoustic conditions (SNR below -5dB) or when the speech and noise sources are closely spaced. We propose an alternative method that is effective, and that attempts to use a phase-based filtering criterion by substituting the cross power spectrum of the noisy signals received on the two channels by its real part. Then, a variant of the running minimum noise tracking procedure is applied on the estimated speech spectrum as an adaptive postfiltering to reduce the cosine shaped power spectrum of the remaining residual musical noise to a minimum spectral floor. Using that adaptive postfilter, a softdecision scheme is implemented to control the amount of noise suppression. Our preliminary results based on experiments conducted on real speech signals show an improved performance of the proposed method over the coherence based approaches. These results also show that it performs well on speech while producing less spectral distortion even in severe noisy conditions

    Unbiased coherent-to-diffuse ratio estimation for dereverberation

    Full text link
    We investigate the estimation of the time- and frequency-dependent coherent-to-diffuse ratio (CDR) from the measured spatial coherence between two omnidirectional microphones. We illustrate the relationship between several known CDR es-timators using a geometric interpretation in the complex plane, discuss the problem of estimator bias, and propose unbiased versions of the estimators. Furthermore, we show that knowl-edge of either the direction of arrival (DOA) of the target source or the coherence of the noise field is sufficient for an unbiased CDR estimation. Finally, we apply the CDR estimators to the problem of dereverberation, using automatic speech recognition word error rate as objective performance measure