286 research outputs found

    System approach to robust acoustic echo cancellation through semi-blind source separation based on independent component analysis

    Get PDF
    We live in a dynamic world full of noises and interferences. The conventional acoustic echo cancellation (AEC) framework based on the least mean square (LMS) algorithm by itself lacks the ability to handle many secondary signals that interfere with the adaptive filtering process, e.g., local speech and background noise. In this dissertation, we build a foundation for what we refer to as the system approach to signal enhancement as we focus on the AEC problem. We first propose the residual echo enhancement (REE) technique that utilizes the error recovery nonlinearity (ERN) to "enhances" the filter estimation error prior to the filter adaptation. The single-channel AEC problem can be viewed as a special case of semi-blind source separation (SBSS) where one of the source signals is partially known, i.e., the far-end microphone signal that generates the near-end acoustic echo. SBSS optimized via independent component analysis (ICA) leads to the system combination of the LMS algorithm with the ERN that allows for continuous and stable adaptation even during double talk. Second, we extend the system perspective to the decorrelation problem for AEC, where we show that the REE procedure can be applied effectively in a multi-channel AEC (MCAEC) setting to indirectly assist the recovery of lost AEC performance due to inter-channel correlation, known generally as the "non-uniqueness" problem. We develop a novel, computationally efficient technique of frequency-domain resampling (FDR) that effectively alleviates the non-uniqueness problem directly while introducing minimal distortion to signal quality and statistics. We also apply the system approach to the multi-delay filter (MDF) that suffers from the inter-block correlation problem. Finally, we generalize the MCAEC problem in the SBSS framework and discuss many issues related to the implementation of an SBSS system. We propose a constrained batch-online implementation of SBSS that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. The proposed techniques are developed from a pragmatic standpoint, motivated by real-world problems in acoustic and audio signal processing. Generalization of the orthogonality principle to the system level of an AEC problem allows us to relate AEC to source separation that seeks to maximize the independence, hence implicitly the orthogonality, not only between the error signal and the far-end signal, but rather, among all signals involved. The system approach, for which the REE paradigm is just one realization, enables the encompassing of many traditional signal enhancement techniques in analytically consistent yet practically effective manner for solving the enhancement problem in a very noisy and disruptive acoustic mixing environment.PhDCommittee Chair: Biing-Hwang Juang; Committee Member: Brani Vidakovic; Committee Member: David V. Anderson; Committee Member: Jeff S. Shamma; Committee Member: Xiaoli M

    Change prediction for low complexity combined beamforming and acoustic echo cancellation

    Get PDF
    Time-variant beamforming (BF) and acoustic echo cancellation (AEC) are two techniques that are frequently employed for improving the quality of hands-free speech communication. However, the combined application of both is quite challenging as it either introduces high computational complexity or insufficient tracking. We propose a new method to improve the performance of the low-complexity beamformer first (BF-first) structure, which we call change prediction(ChaP). ChaP gathers information on several BF changes to predict the effective impulse response seen by the AEC after the next BF change. To account for uncertain data and convergence states in the predictions, reliability measures are introduced to improve ChaP in realistic scenarios

    Adaptive Algorithms for Intelligent Acoustic Interfaces

    Get PDF
    Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

    Adaptive Algorithms for Intelligent Acoustic Interfaces

    Get PDF
    Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

    In Car Audio

    Get PDF
    This chapter presents implementations of advanced in Car Audio Applications. The system is composed by three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved

    Beiträge zu breitbandigen Freisprechsystemen und ihrer Evaluation

    Get PDF
    This work deals with the advancement of wideband hands-free systems (HFS’s) for mono- and stereophonic cases of application. Furthermore, innovative contributions to the corr. field of quality evaluation are made. The proposed HFS approaches are based on frequency-domain adaptive filtering for system identification, making use of Kalman theory and state-space modeling. Functional enhancement modules are developed in this work, which improve one or more of key quality aspects, aiming at not to harm others. In so doing, these modules can be combined in a flexible way, dependent on the needs at hand. The enhanced monophonic HFS is evaluated according to automotive ITU-T recommendations, to prove its customized efficacy. Furthermore, a novel methodology and techn. framework are introduced in this work to improve the prototyping and evaluation process of automotive HF and in-car-communication (ICC) systems. The monophonic HFS in several configurations hereby acts as device under test (DUT) and is thoroughly investigated, which will show the DUT’s satisfying performance, as well as the advantages of the proposed development process. As current methods for the evaluation of HFS’s in dynamic conditions oftentimes still lack flexibility, reproducibility, and accuracy, this work introduces “Car in a Box” (CiaB) as a novel, improved system for this demanding task. It is able to enhance the development process by performing high-resolution system identification of dynamic electro-acoustical systems. The extracted dyn. impulse response trajectories are then applicable to arbitrary input signals in a synthesis operation. A realistic dynamic automotive auralization of a car cabin interior is available for HFS evaluation. It is shown that this system improves evaluation flexibility at guaranteed reproducibility. In addition, the accuracy of evaluation methods can be increased by having access to exact, realistic imp. resp. trajectories acting as a so-called “ground truth” reference. If CiaB is included into an automotive evaluation setup, there is no need for an acoustical car interior prototype to be present at this stage of development. Hency, CiaB may ease the HFS development process. Dynamic acoustic replicas may be provided including an arbitrary number of acoustic car cabin interiors for multiple developers simultaneously. With CiaB, speech enh. system developers therefore have an evaluation environment at hand, which can adequately replace the real environment.Diese Arbeit beschäftigt sich mit der Weiterentwicklung breitbandiger Freisprechsysteme für mono-/stereophone Anwendungsfälle und liefert innovative Beiträge zu deren Qualitätsmessung. Die vorgestellten Verfahren basieren auf im Frequenzbereich adaptierenden Algorithmen zur Systemidentifikation gemäß Kalman-Theorie in einer Zustandsraumdarstellung. Es werden funktionale Erweiterungsmodule dahingehend entwickelt, dass mindestens eine Qualitätsanforderung verbessert wird, ohne andere eklatant zu verletzen. Diese nach Anforderung flexibel kombinierbaren algorithmischen Erweiterungen werden gemäß Empfehlungen der ITU-T (Rec. P.1110/P.1130) in vorwiegend automotiven Testszenarien getestet und somit deren zielgerichtete Wirksamkeit bestätigt. Es wird eine Methodensammlung und ein technisches System zur verbesserten Prototypentwicklung/Evaluation von automotiven Freisprech- und Innenraumkommunikationssystemen vorgestellt und beispielhaft mit dem monophonen Freisprechsystem in diversen Ausbaustufen zur Anwendung gebracht. Daraus entstehende Vorteile im Entwicklungs- und Testprozess von Sprachverbesserungssystem werden dargelegt und messtechnisch verifiziert. Bestehende Messverfahren zum Verhalten von Freisprechsystemen in zeitvarianten Umgebungen zeigten bisher oft nur ein unzureichendes Maß an Flexibilität, Reproduzierbarkeit und Genauigkeit. Daher wird hier das „Car in a Box“-Verfahren (CiaB) entwickelt und vorgestellt, mit dem zeitvariante elektro-akustische Systeme technisch identifiziert werden können. So gewonnene dynamische Impulsantworten können im Labor in einer Syntheseoperation auf beliebige Eingangsignale angewandt werden, um realistische Testsignale unter dyn. Bedingungen zu erzeugen. Bei diesem Vorgehen wird ein hohes Maß an Flexibilität bei garantierter Reproduzierbarkeit erlangt. Es wird gezeigt, dass die Genauigkeit von darauf basierenden Evaluationsverfahren zudem gesteigert werden kann, da mit dem Vorliegen von exakten, realen Impulsantworten zu jedem Zeitpunkt der Messung eine sogenannte „ground truth“ als Referenz zur Verfügung steht. Bei der Einbindung von CiaB in einen Messaufbau für automotive Freisprechsysteme ist es bedeutsam, dass zu diesem Zeitpunkt das eigentliche Fahrzeug nicht mehr benötigt wird. Es wird gezeigt, dass eine dyn. Fahrzeugakustikumgebung, wie sie im Entwicklungsprozess von automotiven Sprachverbesserungsalgorithmen benötigt wird, in beliebiger Anzahl vollständig und mind. gleichwertig durch CiaB ersetzt werden kann

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Full text link
    Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we conduct a comprehensive overview of audio-visual speaker tracking. To our knowledge, this is the first extensive survey over the past five years. We introduce the family of Bayesian filters and summarize the methods for obtaining audio-visual measurements. In addition, the existing trackers and their performance on AV16.3 dataset are summarized. In the past few years, deep learning techniques have thrived, which also boosts the development of audio visual speaker tracking. The influence of deep learning techniques in terms of measurement extraction and state estimation is also discussed. At last, we discuss the connections between audio-visual speaker tracking and other areas such as speech separation and distributed speaker tracking

    Single- and multi-microphone speech dereverberation using spectral enhancement

    Get PDF
    In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences. This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems. In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones. The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). Reverberant speech can be described as sounding distant with noticeable echo and colouration. These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone. Conversely, early reverberations tend to improve the intelligibility of speech. In combination with the direct sound it is sometimes referred to as the early speech component. Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation. More specifically the dissertation deals with dereverberation techniques, i.e., signal processing techniques to reduce the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., at estimation of the early speech component. This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal. This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s). In our work an existing single-channel statistical reverberation model serves as a starting point. The model is characterized by one parameter that depends on the acoustic characteristics of the environment. We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance. This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power. A generalization of the statistical reverberation model in which the direct sound is incorporated is developed. This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections. The generalized model is used to derive a novel spectral variance estimator. When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased. Single-microphone systems only exploit the temporal and spectral diversity of the received signal. Reverberation, of course, also induces spatial diversity. To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer. It is not a priori evident whether spectral enhancement is best done before or after the spatial processor. For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique. An advantage of the latter option is that the spectral variance estimator can be further improved. Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality. The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system. Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker. Usually an acoustic echo canceller is used to cancel the far-end echo. Additionally a post-processor is used to suppress background noise and residual echo, i.e., echo which could not be cancelled by the echo canceller. In this work a novel structure and post-processor for an acoustic echo canceller are developed. The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise. The late reverberation and late residual echo are estimated using the generalized statistical reverberation model. Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise. The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility
    • …
    corecore