35 research outputs found
Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation
This paper deals with speech enhancement in dual-microphone smartphones using
beamforming along with postfiltering techniques. The performance of these algorithms relies on
a good estimation of the acoustic channel and speech and noise statistics. In this work we present
a speech enhancement system that combines the estimation of the relative transfer function (RTF)
between microphones using an extended Kalman filter framework with a novel speech presence
probability estimator intended to track the noise statisticsâ variability. The available dual-channel
information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction
is further improved by means of postfiltering techniques that take advantage of the speech presence
estimation. Our proposal is evaluated in different reverberant and noisy environments when the
smartphone is used in both close-talk and far-talk positions. The experimental results show that our
system achieves improvements in terms of noise reduction, low speech distortion and better speech
intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish
Ministry of Education through the National Program FPU under Grant FPU15/0416
Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise
We consider the problem of simultaneous reduction of acoustic echo,
reverberation and noise. In real scenarios, these distortion sources may occur
simultaneously and reducing them implies combining the corresponding
distortion-specific filters. As these filters interact with each other, they
must be jointly optimized. We propose to model the target and residual signals
after linear echo cancellation and dereverberation using a multichannel
Gaussian modeling framework and to jointly represent their spectra by means of
a neural network. We develop an iterative block-coordinate ascent algorithm to
update all the filters. We evaluate our system on real recordings of acoustic
echo, reverberation and noise acquired with a smart speaker in various
situations. The proposed approach outperforms in terms of overall distortion a
cascade of the individual approaches and a joint reduction approach which does
not rely on a spectral model of the target and residual signals
BeitrÀge zu breitbandigen Freisprechsystemen und ihrer Evaluation
This work deals with the advancement of wideband hands-free systems (HFSâs) for mono- and stereophonic cases of application. Furthermore, innovative contributions to the corr. field of quality evaluation are made. The proposed HFS approaches are based on frequency-domain adaptive filtering for system identification, making use of Kalman theory and state-space modeling. Functional enhancement modules are developed in this work, which improve one or more of key quality aspects, aiming at not to harm others. In so doing, these modules can be combined in a flexible way, dependent on the needs at hand. The enhanced monophonic HFS is evaluated according to automotive ITU-T recommendations, to prove its customized efficacy. Furthermore, a novel methodology and techn. framework are introduced in this work to improve the prototyping and evaluation process of automotive HF and in-car-communication (ICC) systems. The monophonic HFS in several configurations hereby acts as device under test (DUT) and is thoroughly investigated, which will show the DUTâs satisfying performance, as well as the advantages of the proposed development process. As current methods for the evaluation of HFSâs in dynamic conditions oftentimes still lack flexibility, reproducibility, and accuracy, this work introduces âCar in a Boxâ (CiaB) as a novel, improved system for this demanding task. It is able to enhance the development process by performing high-resolution system identification of dynamic electro-acoustical systems. The extracted dyn. impulse response trajectories are then applicable to arbitrary input signals in a synthesis operation. A realistic dynamic automotive auralization of a car cabin interior is available for HFS evaluation. It is shown that this system improves evaluation flexibility at guaranteed reproducibility. In addition, the accuracy of evaluation methods can be increased by having access to exact, realistic imp. resp. trajectories acting as a so-called âground truthâ reference. If CiaB is included into an automotive evaluation setup, there is no need for an acoustical car interior prototype to be present at this stage of development. Hency, CiaB may ease the HFS development process. Dynamic acoustic replicas may be provided including an arbitrary number of acoustic car cabin interiors for multiple developers simultaneously. With CiaB, speech enh. system developers therefore have an evaluation environment at hand, which can adequately replace the real environment.Diese Arbeit beschĂ€ftigt sich mit der Weiterentwicklung breitbandiger Freisprechsysteme fĂŒr mono-/stereophone AnwendungsfĂ€lle und liefert innovative BeitrĂ€ge zu deren QualitĂ€tsmessung. Die vorgestellten Verfahren basieren auf im Frequenzbereich adaptierenden Algorithmen zur Systemidentifikation gemÀà Kalman-Theorie in einer Zustandsraumdarstellung. Es werden funktionale Erweiterungsmodule dahingehend entwickelt, dass mindestens eine QualitĂ€tsanforderung verbessert wird, ohne andere eklatant zu verletzen. Diese nach Anforderung flexibel kombinierbaren algorithmischen Erweiterungen werden gemÀà Empfehlungen der ITU-T (Rec. P.1110/P.1130) in vorwiegend automotiven Testszenarien getestet und somit deren zielgerichtete Wirksamkeit bestĂ€tigt. Es wird eine Methodensammlung und ein technisches System zur verbesserten Prototypentwicklung/Evaluation von automotiven Freisprech- und Innenraumkommunikationssystemen vorgestellt und beispielhaft mit dem monophonen Freisprechsystem in diversen Ausbaustufen zur Anwendung gebracht. Daraus entstehende Vorteile im Entwicklungs- und Testprozess von Sprachverbesserungssystem werden dargelegt und messtechnisch verifiziert. Bestehende Messverfahren zum Verhalten von Freisprechsystemen in zeitvarianten Umgebungen zeigten bisher oft nur ein unzureichendes MaĂ an FlexibilitĂ€t, Reproduzierbarkeit und Genauigkeit. Daher wird hier das âCar in a Boxâ-Verfahren (CiaB) entwickelt und vorgestellt, mit dem zeitvariante elektro-akustische Systeme technisch identifiziert werden können. So gewonnene dynamische Impulsantworten können im Labor in einer Syntheseoperation auf beliebige Eingangsignale angewandt werden, um realistische Testsignale unter dyn. Bedingungen zu erzeugen. Bei diesem Vorgehen wird ein hohes MaĂ an FlexibilitĂ€t bei garantierter Reproduzierbarkeit erlangt. Es wird gezeigt, dass die Genauigkeit von darauf basierenden Evaluationsverfahren zudem gesteigert werden kann, da mit dem Vorliegen von exakten, realen Impulsantworten zu jedem Zeitpunkt der Messung eine sogenannte âground truthâ als Referenz zur VerfĂŒgung steht. Bei der Einbindung von CiaB in einen Messaufbau fĂŒr automotive Freisprechsysteme ist es bedeutsam, dass zu diesem Zeitpunkt das eigentliche Fahrzeug nicht mehr benötigt wird. Es wird gezeigt, dass eine dyn. Fahrzeugakustikumgebung, wie sie im Entwicklungsprozess von automotiven Sprachverbesserungsalgorithmen benötigt wird, in beliebiger Anzahl vollstĂ€ndig und mind. gleichwertig durch CiaB ersetzt werden kann
On the application of minimum noise tracking to cancel cosine shaped residual noise
It has been shown recently that for coherence based dual microphone array speech enhancement systems, cross-spectral subtraction is an efficient technique aimed to reduce the correlated noise components. The zero-phase filtering criterion employed in these methods is derived from the standard coherence function that is modified to
incorporate the noise cross power spectrum between the two channels. However, there has been limited success at applying coherence based filters when speech processing is carried out under relatively harsh acoustic conditions (SNR below -5dB) or when the speech and noise sources are closely spaced. We propose an alternative method that is effective, and that attempts to use a phase-based filtering criterion by substituting the cross power spectrum of the noisy signals received on the two channels by its real part. Then, a variant of the running minimum noise tracking procedure is applied on the estimated speech spectrum as an adaptive
postfiltering to reduce the cosine shaped power spectrum of the remaining residual musical noise to a minimum spectral floor. Using that adaptive postfilter, a softdecision
scheme is implemented to control the amount of noise suppression. Our preliminary results based on experiments conducted on real speech signals show an improved performance of the proposed method over the coherence based
approaches. These results also show that it performs well on speech while producing less spectral distortion even in severe noisy conditions
Unbiased coherent-to-diffuse ratio estimation for dereverberation
We investigate the estimation of the time- and frequency-dependent coherent-to-diffuse ratio (CDR) from the measured spatial coherence between two omnidirectional microphones. We illustrate the relationship between several known CDR es-timators using a geometric interpretation in the complex plane, discuss the problem of estimator bias, and propose unbiased versions of the estimators. Furthermore, we show that knowl-edge of either the direction of arrival (DOA) of the target source or the coherence of the noise field is sufficient for an unbiased CDR estimation. Finally, we apply the CDR estimators to the problem of dereverberation, using automatic speech recognition word error rate as objective performance measure