26 research outputs found

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Acoustic Echo Cancellation and their Application in ADF

    Get PDF
    In this paper, we present an overview of the principal, structure and the application of the echo cancellation and kind of application to improve the performance of the systems. Echo is a process in which a delayed and distorted version o the original sound or voice signal is reflected back to the source. For the acoustic echo canceller much and more study are required to make the good tracking speed fast and reduce the computational complexity. Due to the increasing the processing requirement, widespread implementation had to wait for advances in LSI, VLSI echo canceller appeared. DOI: 10.17762/ijritcc2321-8169.150513

    Sparseness-controlled adaptive algorithms for supervised and unsupervised system identification

    No full text
    In single-channel hands-free telephony, the acoustic coupling between the loudspeaker and the microphone can be strong and this generates echoes that can degrade user experience. Therefore, effective acoustic echo cancellation (AEC) is necessary to maintain a stable system and hence improve the perceived voice quality of a call. Traditionally, adaptive filters have been deployed in acoustic echo cancellers to estimate the acoustic impulse responses (AIRs) using adaptive algorithms. The performances of a range of well-known algorithms are studied in the context of both AEC and network echo cancellation (NEC). It presents insights into their tracking performances under both time-invariant and time-varying system conditions. In the context of AEC, the level of sparseness in AIRs can vary greatly in a mobile environment. When the response is strongly sparse, convergence of conventional approaches is poor. Drawing on techniques originally developed for NEC, a class of time-domain and a frequency-domain AEC algorithms are proposed that can not only work well in both sparse and dispersive circumstances, but also adapt dynamically to the level of sparseness using a new sparseness-controlled approach. As it will be shown later that the early part of the acoustic echo path is sparse while the late reverberant part of the acoustic path is dispersive, a novel approach to an adaptive filter structure that consists of two time-domain partition blocks is proposed such that different adaptive algorithms can be used for each part. By properly controlling the mixing parameter for the partitioned blocks separately, where the block lengths are controlled adaptively, the proposed partitioned block algorithm works well in both sparse and dispersive time-varying circumstances. A new insight into an analysis on the tracking performance of improved proportionate NLMS (IPNLMS) is presented by deriving the expression for the mean-square error. By employing the framework for both sparse and dispersive time-varying echo paths, this work validates the analytic results in practical simulations for AEC. The time-domain second-order statistic based blind SIMO identification algorithms, which exploit the cross relation method, are investigated and then a technique with proportionate step-size control for both sparse and dispersive system identification is also developed

    Development and applications of adaptive IIR and subband filters

    Get PDF
    Adaptive infinite impulse response (IIR) filter is a challenging research area. Identifiers and Equalizers are among the most essential digital signal processing devices for digital communication systems. In this study, we consider IIR channel both for system identification and channel equalization purposes. We focus on four different approaches: Least Mean Square (LMS), Recursive Least Square (RLS), Genetic Algorithm (GA) and Subband Adaptive Filter (SAF). ). The performance of conventional LMS and RLS based IIR system identification and channel equalization are found with the help of computer simulations. And also the convergence speed and the ability to locate the global optimum solution using a population based algorithm named Genetic Algorithm is given
    corecore