60 research outputs found

    주파수 및 시간적 상관관계에 기반한 음향학적 에코 억제 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 김남수.In the past decades, a number of approaches have been dedicated to acoustic echo cancellation and suppression which reduce the negative effects of acoustic echo, namely the acoustic coupling between the loudspeaker and microphone in a room. In particular, the increasing use of full-duplex telecommunication systems has led to the requirement of faster and more reliable acoustic echo cancellation algorithms. The solutions have been based on adaptive filters, but the length of these filters has to be long enough to consider most of the echo signal and linear filtering in these algorithms may be limited to remove the echo signal in various environments. In this thesis, a novel stereophonic acoustic echo suppression (SAES) technique based on spectral and temporal correlations is proposed in the short-time Fourier transform (STFT) domain. Unlike traditional stereophonic acoustic echo cancellation, the proposed algorithm estimates the echo spectra in the STFT domain and uses a Wiener filter to suppress echo without performing any explicit double-talk detection. The proposed approach takes account of interdependencies among components in adjacent time frames and frequency bins, which enables more accurate estimation of the echo signals. Due to the limitations of power amplifiers or loudspeakers, the echo signals captured in the microphones are not in a linear relationship with the far-end signals even when the echo path is perfectly linear. The nonlinear components of the echo cannot be successfully removed by a linear acoustic echo canceller. The remaining echo components in the output of acoustic echo suppression (AES) can be further suppressed by applying residual echo suppression (RES) algorithms. In this thesis, we propose an optimal RES gain estimation based on deep neural network (DNN) exploiting both the far-end and the AES output signals in all frequency bins. A DNN structure is introduced as a regression function representing the complex nonlinear mapping from these signals to the optimal RES gain. Because of the capability of the DNN, the spectro-temporal correlations in the full-band can be considered while finding the nonlinear function. The proposed method does not require any explicit double-talk detectors to deal with single-talk and double-talk situations. One of the well-known approaches for nonlinear acoustic echo cancellation is an adaptive Volterra filtering and various algorithms based on the Volterra filter were proposed to describe the characteristics of nonlinear echo and showed the better performance than the conventional linear filtering. However, the performance might be not satisfied since these algorithms could not consider the full correlation for the nonlinear relationship between the input signal and far-end signal in time-frequency domain. In this thesis, we propose a novel DNN-based approach for nonlinear acoustic echo suppression (NAES), extending the proposed RES algorithm. Instead of estimating the residual gain for suppressing the nonlinear echo components, the proposed algorithm straightforwardly recovers the near-end speech signal through the direct gain estimation obtained from DNN frameworks on the input and far-end signal. For echo aware training, a priori and a posteriori signal-to-echo ratio (SER) are introduced as additional inputs of the DNN for tracking the change of the echo signal. In addition, the multi-task learning (MTL) to the DNN-based NAES is combined to the DNN incorporating echo aware training for robustness. In the proposed system, an additional task of double-talk detection is jointly trained with the primary task of the gain estimation for NAES. The DNN can learn the good representations which can suppress more in single-talk periods and improve the gain estimates in double-talk periods through the MTL framework. Besides, the proposed NAES using echo aware training and MTL with double-talk detection makes the DNN be more robust in various conditions. The proposed techniques show significantly better performance than the conventional AES methods in both single- and double-talk periods. As a pre-processing of various applications such as speech recognition and speech enhancement, these approaches can help to transmit the clean speech and provide an acceptable communication in full-duplex real environments.Chapter 1 Introduction 1 1.1 Background 1 1.2 Scope of thesis 3 Chapter 2 Conventional Approaches for Acoustic Echo Suppression 7 2.1 Single Channel Acoustic Echo Cancellation and Suppression 8 2.1.1 Single Channel Acoustic Echo Cancellation 8 2.1.2 Adaptive Filters for Acoustic Echo Cancellation 10 2.1.3 Acoustic Echo Suppression Based on Spectral Modication 11 2.2 Residual Echo Suppression 13 2.2.1 Spectral Feature-based Nonlinear Residual Echo Suppression 15 2.3 Stereophonic Acoustic Echo Cancellation 17 2.4 Wiener Filtering for Stereophonic Acoustic Echo Suppression 20 Chapter 3 Stereophonic Acoustic Echo Suppression Incorporating Spectro-Temporal Correlations 25 3.1 Introduction 25 3.2 Linear Time-Invariant Systems in the STFT Domain with Crossband Filtering 26 3.3 Enhanced SAES (ESAES) Utilizing Spectro-Temporal Correlations 29 3.3.1 Problem Formulation 31 3.3.2 Estimation of Extended PSD Matrices, Echo Spectra, and Gain Function 34 3.3.3 Complexity of the Proposed ESAES Algorithm 36 3.4 Experimental Results 37 3.5 Summary 41 Chapter 4 Nonlinear Residual Echo Suppression Based on Deep Neural Network 43 4.1 Introduction 43 4.2 A Brief Review on RES 45 4.3 Deep Neural Networks 46 4.4 Nonlinear RES using Deep Neural Network 49 4.5 Experimental Results 52 4.5.1 Combination with Stereophonic Acoustic Echo Suppression 59 4.6 Summary 61 Chapter 5 Enhanced Deep Learning Frameworks for Nonlinear Acoustic Echo Suppression 69 5.1 Introduction 69 5.2 DNN-based Nonlinear Acoustic Echo Suppression using Echo Aware Training 72 5.3 Multi-Task Learning for NAES 75 5.4 Experimental Results 78 5.5 Summary 82 Chapter 6 Conclusions 89 Bibliography 91 요약 101Docto

    System approach to robust acoustic echo cancellation through semi-blind source separation based on independent component analysis

    Get PDF
    We live in a dynamic world full of noises and interferences. The conventional acoustic echo cancellation (AEC) framework based on the least mean square (LMS) algorithm by itself lacks the ability to handle many secondary signals that interfere with the adaptive filtering process, e.g., local speech and background noise. In this dissertation, we build a foundation for what we refer to as the system approach to signal enhancement as we focus on the AEC problem. We first propose the residual echo enhancement (REE) technique that utilizes the error recovery nonlinearity (ERN) to "enhances" the filter estimation error prior to the filter adaptation. The single-channel AEC problem can be viewed as a special case of semi-blind source separation (SBSS) where one of the source signals is partially known, i.e., the far-end microphone signal that generates the near-end acoustic echo. SBSS optimized via independent component analysis (ICA) leads to the system combination of the LMS algorithm with the ERN that allows for continuous and stable adaptation even during double talk. Second, we extend the system perspective to the decorrelation problem for AEC, where we show that the REE procedure can be applied effectively in a multi-channel AEC (MCAEC) setting to indirectly assist the recovery of lost AEC performance due to inter-channel correlation, known generally as the "non-uniqueness" problem. We develop a novel, computationally efficient technique of frequency-domain resampling (FDR) that effectively alleviates the non-uniqueness problem directly while introducing minimal distortion to signal quality and statistics. We also apply the system approach to the multi-delay filter (MDF) that suffers from the inter-block correlation problem. Finally, we generalize the MCAEC problem in the SBSS framework and discuss many issues related to the implementation of an SBSS system. We propose a constrained batch-online implementation of SBSS that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. The proposed techniques are developed from a pragmatic standpoint, motivated by real-world problems in acoustic and audio signal processing. Generalization of the orthogonality principle to the system level of an AEC problem allows us to relate AEC to source separation that seeks to maximize the independence, hence implicitly the orthogonality, not only between the error signal and the far-end signal, but rather, among all signals involved. The system approach, for which the REE paradigm is just one realization, enables the encompassing of many traditional signal enhancement techniques in analytically consistent yet practically effective manner for solving the enhancement problem in a very noisy and disruptive acoustic mixing environment.PhDCommittee Chair: Biing-Hwang Juang; Committee Member: Brani Vidakovic; Committee Member: David V. Anderson; Committee Member: Jeff S. Shamma; Committee Member: Xiaoli M

    In Car Audio

    Get PDF
    This chapter presents implementations of advanced in Car Audio Applications. The system is composed by three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved

    Метод подавления акустического эха на основе рекуррентной нейронной сети и алгоритма кластеризации

    Get PDF
    The article solves the problem of acoustic echo suppression based on a neural network that evaluates an ideal binary mask IBM using features extracted from a mixture of near-end and far-end signals. The novelty of the proposed method lies in the use of the clustering algorithm in addition to the bidirectional recurrent neural network BLSTM. To evaluate the use of the EM, Mean-Shift, k-Means clustering algorithms, the models have been trained and tested on the TIMIT database. For each model, the ERLE, PESQ, STOI metrics have been calculated to characterize its quality. The use of the EM and Mean-Shift clustering algorithms appeared to be inefficient compared to the BLSTM algorithm at a signal-to-echo ratio of 10 dB. With a signal-to-echo ratio of 6 dB, BLSTM+Mean-Shift resulted in a marginal improvement in the PESQ metric compared to the BLSTM algorithm. The results of the experiments show the effectiveness of the proposed BLSTM model when using a network with the K-Means algorithm, compared to using a pure BLSTM for echo cancellation in double-talk scenarios. With a signal-to-echo ratio of 10 dB, the STOI metric, which characterizes speech intelligibility, has improved by 7%, and the PESQ metric, which characterizes the quality of speech restoration, by 18.8%.В статье решается задача подавления акустического эха на основе нейронной сети оценивающей идеальную двоичную маску IBM из признаков, извлеченных из смеси сигналов ближнего и дальнего конца. Новизна предложенного метода заключается в использовании алгоритма кластеризации дополнительно с двунаправленной рекуррентной нейронной сетью BLSTM. Для оценки использования алгоритмов кластеризации EM, Mean-Shift, k-Means, модели были обучены и протестированы на базе данных TIMIT. Для каждой модели были вычислены метрики ERLE, PESQ, STOI, характеризующие ее качество. Использование алгоритмов кластеризации EM, Mean-Shift оказалось неэффективным по сравнению с алгоритмом BLSTM при соотношении сигнал/эхо 10 дБ. При соотношении сигнал/эхо 6 дБ BLSTM+Mean-Shift привел к незначительному улучшению метрики PESQ по сравнению с алгоритмом BLSTM. Результаты экспериментов показали эффективность предложенной модели BLSTM при использовании сети с алгоритмом K-Means, по сравнению с использованием чистой BLSTM для подавления эха в сценариях с двойным разговором. При соотношении сигнал/эхо 10 дБ метрика STOI, характеризующая разборчивость речи, улучшилась на 7%, а метрика PESQ, характеризующая качество восстановления речи, на 18.8%

    Suppressing acoustic echo in a sampled auditory envelope space

    Get PDF

    Stereo Acoustic Echo Control Using A Simplified Echo Path Model

    Get PDF
    In handsfree tele- or video-communication, acoustic echoes arise due to the coupling between the loudspeakers and microphones. It is much more challenging to remove the undesired acoustic echoes for stereo or multi-channel tele-communication systems than for mono systems due to the non-uniqueness problem. While non-uniqueness can be prevented by introducing independent distortions into the left and right loudspeaker signals, stereo echo cancellation is more challenging in terms of convergence speed and computational complexity than mono echo cancellation. The proposed stereo echo control algorithm circumvents the non-uniqueness problem by using simplified echo path models consisting of delays and short-time spectral modification. It is shown that for reasonably symmetric systems the left and right echo path models are similar enough that a single echo path model can be used for estimating the total echo power spectrum and a gain filter for removing the echo from the microphone channels. The proposed algorithm is also applicable to multi-channel systems and the computational complexity is very low

    Beiträge zu breitbandigen Freisprechsystemen und ihrer Evaluation

    Get PDF
    This work deals with the advancement of wideband hands-free systems (HFS’s) for mono- and stereophonic cases of application. Furthermore, innovative contributions to the corr. field of quality evaluation are made. The proposed HFS approaches are based on frequency-domain adaptive filtering for system identification, making use of Kalman theory and state-space modeling. Functional enhancement modules are developed in this work, which improve one or more of key quality aspects, aiming at not to harm others. In so doing, these modules can be combined in a flexible way, dependent on the needs at hand. The enhanced monophonic HFS is evaluated according to automotive ITU-T recommendations, to prove its customized efficacy. Furthermore, a novel methodology and techn. framework are introduced in this work to improve the prototyping and evaluation process of automotive HF and in-car-communication (ICC) systems. The monophonic HFS in several configurations hereby acts as device under test (DUT) and is thoroughly investigated, which will show the DUT’s satisfying performance, as well as the advantages of the proposed development process. As current methods for the evaluation of HFS’s in dynamic conditions oftentimes still lack flexibility, reproducibility, and accuracy, this work introduces “Car in a Box” (CiaB) as a novel, improved system for this demanding task. It is able to enhance the development process by performing high-resolution system identification of dynamic electro-acoustical systems. The extracted dyn. impulse response trajectories are then applicable to arbitrary input signals in a synthesis operation. A realistic dynamic automotive auralization of a car cabin interior is available for HFS evaluation. It is shown that this system improves evaluation flexibility at guaranteed reproducibility. In addition, the accuracy of evaluation methods can be increased by having access to exact, realistic imp. resp. trajectories acting as a so-called “ground truth” reference. If CiaB is included into an automotive evaluation setup, there is no need for an acoustical car interior prototype to be present at this stage of development. Hency, CiaB may ease the HFS development process. Dynamic acoustic replicas may be provided including an arbitrary number of acoustic car cabin interiors for multiple developers simultaneously. With CiaB, speech enh. system developers therefore have an evaluation environment at hand, which can adequately replace the real environment.Diese Arbeit beschäftigt sich mit der Weiterentwicklung breitbandiger Freisprechsysteme für mono-/stereophone Anwendungsfälle und liefert innovative Beiträge zu deren Qualitätsmessung. Die vorgestellten Verfahren basieren auf im Frequenzbereich adaptierenden Algorithmen zur Systemidentifikation gemäß Kalman-Theorie in einer Zustandsraumdarstellung. Es werden funktionale Erweiterungsmodule dahingehend entwickelt, dass mindestens eine Qualitätsanforderung verbessert wird, ohne andere eklatant zu verletzen. Diese nach Anforderung flexibel kombinierbaren algorithmischen Erweiterungen werden gemäß Empfehlungen der ITU-T (Rec. P.1110/P.1130) in vorwiegend automotiven Testszenarien getestet und somit deren zielgerichtete Wirksamkeit bestätigt. Es wird eine Methodensammlung und ein technisches System zur verbesserten Prototypentwicklung/Evaluation von automotiven Freisprech- und Innenraumkommunikationssystemen vorgestellt und beispielhaft mit dem monophonen Freisprechsystem in diversen Ausbaustufen zur Anwendung gebracht. Daraus entstehende Vorteile im Entwicklungs- und Testprozess von Sprachverbesserungssystem werden dargelegt und messtechnisch verifiziert. Bestehende Messverfahren zum Verhalten von Freisprechsystemen in zeitvarianten Umgebungen zeigten bisher oft nur ein unzureichendes Maß an Flexibilität, Reproduzierbarkeit und Genauigkeit. Daher wird hier das „Car in a Box“-Verfahren (CiaB) entwickelt und vorgestellt, mit dem zeitvariante elektro-akustische Systeme technisch identifiziert werden können. So gewonnene dynamische Impulsantworten können im Labor in einer Syntheseoperation auf beliebige Eingangsignale angewandt werden, um realistische Testsignale unter dyn. Bedingungen zu erzeugen. Bei diesem Vorgehen wird ein hohes Maß an Flexibilität bei garantierter Reproduzierbarkeit erlangt. Es wird gezeigt, dass die Genauigkeit von darauf basierenden Evaluationsverfahren zudem gesteigert werden kann, da mit dem Vorliegen von exakten, realen Impulsantworten zu jedem Zeitpunkt der Messung eine sogenannte „ground truth“ als Referenz zur Verfügung steht. Bei der Einbindung von CiaB in einen Messaufbau für automotive Freisprechsysteme ist es bedeutsam, dass zu diesem Zeitpunkt das eigentliche Fahrzeug nicht mehr benötigt wird. Es wird gezeigt, dass eine dyn. Fahrzeugakustikumgebung, wie sie im Entwicklungsprozess von automotiven Sprachverbesserungsalgorithmen benötigt wird, in beliebiger Anzahl vollständig und mind. gleichwertig durch CiaB ersetzt werden kann

    A Framework for Speech Enhancement with Ad Hoc Microphone Arrays

    Get PDF
    corecore