456 research outputs found

    System approach to robust acoustic echo cancellation through semi-blind source separation based on independent component analysis

    Get PDF
    We live in a dynamic world full of noises and interferences. The conventional acoustic echo cancellation (AEC) framework based on the least mean square (LMS) algorithm by itself lacks the ability to handle many secondary signals that interfere with the adaptive filtering process, e.g., local speech and background noise. In this dissertation, we build a foundation for what we refer to as the system approach to signal enhancement as we focus on the AEC problem. We first propose the residual echo enhancement (REE) technique that utilizes the error recovery nonlinearity (ERN) to "enhances" the filter estimation error prior to the filter adaptation. The single-channel AEC problem can be viewed as a special case of semi-blind source separation (SBSS) where one of the source signals is partially known, i.e., the far-end microphone signal that generates the near-end acoustic echo. SBSS optimized via independent component analysis (ICA) leads to the system combination of the LMS algorithm with the ERN that allows for continuous and stable adaptation even during double talk. Second, we extend the system perspective to the decorrelation problem for AEC, where we show that the REE procedure can be applied effectively in a multi-channel AEC (MCAEC) setting to indirectly assist the recovery of lost AEC performance due to inter-channel correlation, known generally as the "non-uniqueness" problem. We develop a novel, computationally efficient technique of frequency-domain resampling (FDR) that effectively alleviates the non-uniqueness problem directly while introducing minimal distortion to signal quality and statistics. We also apply the system approach to the multi-delay filter (MDF) that suffers from the inter-block correlation problem. Finally, we generalize the MCAEC problem in the SBSS framework and discuss many issues related to the implementation of an SBSS system. We propose a constrained batch-online implementation of SBSS that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. The proposed techniques are developed from a pragmatic standpoint, motivated by real-world problems in acoustic and audio signal processing. Generalization of the orthogonality principle to the system level of an AEC problem allows us to relate AEC to source separation that seeks to maximize the independence, hence implicitly the orthogonality, not only between the error signal and the far-end signal, but rather, among all signals involved. The system approach, for which the REE paradigm is just one realization, enables the encompassing of many traditional signal enhancement techniques in analytically consistent yet practically effective manner for solving the enhancement problem in a very noisy and disruptive acoustic mixing environment.PhDCommittee Chair: Biing-Hwang Juang; Committee Member: Brani Vidakovic; Committee Member: David V. Anderson; Committee Member: Jeff S. Shamma; Committee Member: Xiaoli M

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    TRANSMISSION PERFORMANCE OPTIMIZATION IN FIBER-WIRELESS ACCESS NETWORKS USING MACHINE LEARNING TECHNIQUES

    Get PDF
    The objective of this dissertation is to enhance the transmission performance in the fiber-wireless access network through mitigating the vital system limitations of both analog radio over fiber (A-RoF) and digital radio over fiber (D-RoF), with machine learning techniques being systematically implemented. The first thrust is improving the spectral efficiency for the optical transmission in the D-RoF to support the delivery of the massive number of bits from digitized radio signals. Advanced digital modulation schemes like PAM8, discrete multi-tone (DMT), and probabilistic shaping are investigated and implemented, while they may introduce severe nonlinear impairments on the low-cost optical intensity-modulation-direct-detection (IMDD) based D-RoF link with a limited dynamic range. An efficient deep neural network (DNN) equalizer/decoder to mitigate the nonlinear degradation is therefore designed and experimentally verified. Besides, we design a neural network based digital predistortion (DPD) to mitigate the nonlinear impairments from the whole link, which can be integrated into a transmitter with more processing resources and power than a receiver in an access network. Another thrust is to proactively mitigate the complex interferences in radio access networks (RANs). The composition of signals from different licensed systems and unlicensed transmitters creates an unprecedently complex interference environment that cannot be solved by conventional pre-defined network planning. In response to the challenges, a proactive interference avoidance scheme using reinforcement learning is proposed and experimentally verified in a mmWave-over-fiber platform. Except for the external sources, the interference may arise internally from a local transmitter as the self-interference (SI) that occupies the same time and frequency block as the signal of interest (SOI). Different from the conventional subtraction-based SI cancellation scheme, we design an efficient dual-inputs DNN (DI-DNN) based canceller which simultaneously cancels the SI and recovers the SOI.Ph.D

    Single- and multi-microphone speech dereverberation using spectral enhancement

    Get PDF
    In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences. This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems. In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones. The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). Reverberant speech can be described as sounding distant with noticeable echo and colouration. These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone. Conversely, early reverberations tend to improve the intelligibility of speech. In combination with the direct sound it is sometimes referred to as the early speech component. Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation. More specifically the dissertation deals with dereverberation techniques, i.e., signal processing techniques to reduce the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., at estimation of the early speech component. This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal. This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s). In our work an existing single-channel statistical reverberation model serves as a starting point. The model is characterized by one parameter that depends on the acoustic characteristics of the environment. We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance. This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power. A generalization of the statistical reverberation model in which the direct sound is incorporated is developed. This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections. The generalized model is used to derive a novel spectral variance estimator. When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased. Single-microphone systems only exploit the temporal and spectral diversity of the received signal. Reverberation, of course, also induces spatial diversity. To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer. It is not a priori evident whether spectral enhancement is best done before or after the spatial processor. For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique. An advantage of the latter option is that the spectral variance estimator can be further improved. Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality. The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system. Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker. Usually an acoustic echo canceller is used to cancel the far-end echo. Additionally a post-processor is used to suppress background noise and residual echo, i.e., echo which could not be cancelled by the echo canceller. In this work a novel structure and post-processor for an acoustic echo canceller are developed. The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise. The late reverberation and late residual echo are estimated using the generalized statistical reverberation model. Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise. The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility

    Magnetic resonance imaging of brain tissue abnormalities: transverse relaxation time in autism and Tourette syndrome and development of a novel whole-brain myelin mapping technique

    Get PDF
    The transverse relaxation time (T2) is a fundamental parameter of magnetic resonance imaging sensitive to tissue microstructure and water content, thus offering a non-invasive approach to evaluate abnormalities of brain tissue in-vivo. Prevailing hypotheses of two childhood psychiatric disorders were tested using quantitative T2 imaging and automated region of interest (ROI) analyses. In autism, the under-connectivity theory, which proposes aberrant connectivity within white matter (WM) was assessed, finding T2 to be eleveted in the frontal and parietal lobes, while dividing whole brain data into neurodevelopmentally relevant WM ROIs found increased T2 in bridging and radiate WM. In Tourette syndrome, tissue abnormalities of deep gray matter structures implicated in the symptomology of this disorder were evaluated and increased T2 of the caudate was found. Despite the sensitivity of quantitative T2 measurements to underlying pathophysiology, interpretation remain difficult. However, in WM, the compartmentalization of distinct water environments may lead to the detection of multi-exponential T2 decay. The metric of interest is principally the myelin water fraction (MWF), which is the proportion of the MRI signal arising from water trapped within layers of the myelin sheath. As a proof of concept study, the ability to measure the MWF based on T2* decay was evaluated and compared to a MWF measurements obtained from T2 decay. Data were analysed using both non-negative least squares and a two-pool model. Signal losses near sources of magnetic field inhomogeneity, such as the sinuses, rendered T2* components inseparable, invalidating this approach for whole brain MWF measurements. However, this study demonstrated the suitability of a two-pool model to calculate the MWF in WM. A novel approach, based on the multi-component gradient echo sampling of spin echoes (mcGESSE) and a two-pool model of WM, is proposed and its feasibility demonstrated using simulations. The in-vivo implementation of mcGESSE followed, with reproducibility of MWF measurements being assessed and the potential of an accelerated protocol using parallel imaging being investigated. While further work is needed to assess data quality, this approach shows great potential to obtain whole brain MWF data within a clinically relevant scan time

    Dirty RF Signal Processing for Mitigation of Receiver Front-end Non-linearity

    Get PDF
    Moderne drahtlose Kommunikationssysteme stellen hohe und teilweise gegensätzliche Anforderungen an die Hardware der Funkmodule, wie z.B. niedriger Energieverbrauch, große Bandbreite und hohe Linearität. Die Gewährleistung einer ausreichenden Linearität ist, neben anderen analogen Parametern, eine Herausforderung im praktischen Design der Funkmodule. Der Fokus der Dissertation liegt auf breitbandigen HF-Frontends für Software-konfigurierbare Funkmodule, die seit einigen Jahren kommerziell verfügbar sind. Die praktischen Herausforderungen und Grenzen solcher flexiblen Funkmodule offenbaren sich vor allem im realen Experiment. Eines der Hauptprobleme ist die Sicherstellung einer ausreichenden analogen Performanz über einen weiten Frequenzbereich. Aus einer Vielzahl an analogen Störeffekten behandelt die Arbeit die Analyse und Minderung von Nichtlinearitäten in Empfängern mit direkt-umsetzender Architektur. Im Vordergrund stehen dabei Signalverarbeitungsstrategien zur Minderung nichtlinear verursachter Interferenz - ein Algorithmus, der besser unter "Dirty RF"-Techniken bekannt ist. Ein digitales Verfahren nach der Vorwärtskopplung wird durch intensive Simulationen, Messungen und Implementierung in realer Hardware verifiziert. Um die Lücken zwischen Theorie und praktischer Anwendbarkeit zu schließen und das Verfahren in reale Funkmodule zu integrieren, werden verschiedene Untersuchungen durchgeführt. Hierzu wird ein erweitertes Verhaltensmodell entwickelt, das die Struktur direkt-umsetzender Empfänger am besten nachbildet und damit alle Verzerrungen im HF- und Basisband erfasst. Darüber hinaus wird die Leistungsfähigkeit des Algorithmus unter realen Funkkanal-Bedingungen untersucht. Zusätzlich folgt die Vorstellung einer ressourceneffizienten Echtzeit-Implementierung des Verfahrens auf einem FPGA. Abschließend diskutiert die Arbeit verschiedene Anwendungsfelder, darunter spektrales Sensing, robuster GSM-Empfang und GSM-basiertes Passivradar. Es wird gezeigt, dass nichtlineare Verzerrungen erfolgreich in der digitalen Domäne gemindert werden können, wodurch die Bitfehlerrate gestörter modulierter Signale sinkt und der Anteil nichtlinear verursachter Interferenz minimiert wird. Schließlich kann durch das Verfahren die effektive Linearität des HF-Frontends stark erhöht werden. Damit wird der zuverlässige Betrieb eines einfachen Funkmoduls unter dem Einfluss der Empfängernichtlinearität möglich. Aufgrund des flexiblen Designs ist der Algorithmus für breitbandige Empfänger universal einsetzbar und ist nicht auf Software-konfigurierbare Funkmodule beschränkt.Today's wireless communication systems place high requirements on the radio's hardware that are largely mutually exclusive, such as low power consumption, wide bandwidth, and high linearity. Achieving a sufficient linearity, among other analogue characteristics, is a challenging issue in practical transceiver design. The focus of this thesis is on wideband receiver RF front-ends for software defined radio technology, which became commercially available in the recent years. Practical challenges and limitations are being revealed in real-world experiments with these radios. One of the main problems is to ensure a sufficient RF performance of the front-end over a wide bandwidth. The thesis covers the analysis and mitigation of receiver non-linearity of typical direct-conversion receiver architectures, among other RF impairments. The main focus is on DSP-based algorithms for mitigating non-linearly induced interference, an approach also known as "Dirty RF" signal processing techniques. The conceived digital feedforward mitigation algorithm is verified through extensive simulations, RF measurements, and implementation in real hardware. Various studies are carried out that bridge the gap between theory and practical applicability of this approach, especially with the aim of integrating that technique into real devices. To this end, an advanced baseband behavioural model is developed that matches to direct-conversion receiver architectures as close as possible, and thus considers all generated distortions at RF and baseband. In addition, the algorithm's performance is verified under challenging fading conditions. Moreover, the thesis presents a resource-efficient real-time implementation of the proposed solution on an FPGA. Finally, different use cases are covered in the thesis that includes spectrum monitoring or sensing, GSM downlink reception, and GSM-based passive radar. It is shown that non-linear distortions can be successfully mitigated at system level in the digital domain, thereby decreasing the bit error rate of distorted modulated signals and reducing the amount of non-linearly induced interference. Finally, the effective linearity of the front-end is increased substantially. Thus, the proper operation of a low-cost radio under presence of receiver non-linearity is possible. Due to the flexible design, the algorithm is generally applicable for wideband receivers and is not restricted to software defined radios

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    Aeronautical engineering: A special bibliography, supplement 29, March 1973

    Get PDF
    This special bibliography lists 410 reports, articles, and other documents introduced into the NASA scientific and technical information system in February 1972
    • …
    corecore