18 research outputs found

    Algorithms and structures for long adaptive echo cancellers

    Get PDF
    The main theme of this thesis is adaptive echo cancellation. Two novel independent approaches are proposed for the design of long echo cancellers with improved performance. In the first approach, we present a novel structure for bulk delay estimation in long echo cancellers which considerably reduces the amount of excess error. The miscalculation of the delay between the near-end and the far-end sections is one of the main causes of this excess error. Two analyses, based on the Least Mean Squares (LMS) algorithm, are presented where certain shapes for the transitions between the end of the near-end section and the beginning of the far-end one are considered. Transient and steady-state behaviours and convergence conditions for the proposed algorithm are studied. Comparisons between the algorithms developed for each transition are presented, and the simulation results agree well with the theoretical derivations. In the second approach, a generalised performance index is proposed for the design of the echo canceller. The proposed algorithm consists of simultaneously applying the LMS algorithm to the near-end section and the Least Mean Fourth (LMF) algorithm to the far-end section of the echo canceller. This combination results in a substantial improvement of the performance of the proposed scheme over both the LMS and other algorithms proposed for comparison. In this approach, the proposed algorithm will be henceforth called the Least Mean Mixed-Norm (LMMN) algorithm. The advantages of the LMMN algorithm over previously reported ones are two folds: it leads to a faster convergence and results in a smaller misadjustment error. Finally, the convergence properties of the LMMN algorithm are derived and the simulation results confirm the superior performance of this proposed algorithm over other well known algorithms

    Fast initialization of Nyquist echo cancellers using circular convolution technique

    Get PDF
    For full-duplex high-speed data transmission over the two-wire line using the same frequency band, it is required to sufficiently suppress the echo. The use of a conventional adaptation method may take a long time to train the echo canceler. Fast training can be achieved by initializing the coefficients of the echo canceler with an estimate of the impulse response of the echo path. In this letter, we propose a method for fast initialization of the echo canceler by using a circular convolution technique. The proposed method enables the use of real-valued training signals instead of complex-valued ones, resulting in significant reduction of the initialization time as well as the implementation complexity. Finally, the performance of the proposed method is analyzed and verified by computer simulation

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Feedback suppression in digital hearing instruments

    Get PDF

    Novel implementation technique for a wavelet-based broadband signal detection system

    Get PDF
    This thesis reports on the design, simulation and implementation of a novel Implementation for a Wavelet-based Broadband Signal Detection System. There is a strong interest in methods of increasing the resolution of sonar systems for the detection of targets at sea. A novel implementation of a wideband active sonar signal detection system is proposed in this project. In the system the Continuous Wavelet Transform is used for target motion estimation and an Adaptive-Network-based Fuzzy inference System (ANFIS) is adopted to minimize the noise effect on target detection. A local optimum search algorithm is introduced in this project to reduce the computation load of the Continuous Wavelet Transform and make it suitable for practical applications. The proposed system is realized on a Xilinx University Program Virtex-II Pro Development System which contains a Virtex II pro XC2VP30 FPGA chip with 2 powerPC 405 cores. Testing for single target detection and multiple target detection shows the proposed system is able to accurately locate targets under reverberation-limited underwater environment with a Signal-Noise-Ratio of up to -30db, with location error less than 10 meters and velocity estimation error less than 1 knot. In the proposed system the combination of CWT and local optimum search algorithm significantly saves the computation time for CWT and make it more practical to real applications. Also the implementation of ANFIS on the FPGA board indicates in the future a real-time ANFIS operation with VLSI implementation would be possible

    Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

    Get PDF
    Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

    Acoustic event detection and localization using distributed microphone arrays

    Get PDF
    Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers. The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments). The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions. The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPC¿s department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate. Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained

    Novel implementation technique for a wavelet-based broadband signal detection system

    Get PDF
    This thesis reports on the design, simulation and implementation of a novel Implementation for a Wavelet-based Broadband Signal Detection System. There is a strong interest in methods of increasing the resolution of sonar systems for the detection of targets at sea. A novel implementation of a wideband active sonar signal detection system is proposed in this project. In the system the Continuous Wavelet Transform is used for target motion estimation and an Adaptive-Network-based Fuzzy inference System (ANFIS) is adopted to minimize the noise effect on target detection. A local optimum search algorithm is introduced in this project to reduce the computation load of the Continuous Wavelet Transform and make it suitable for practical applications. The proposed system is realized on a Xilinx University Program Virtex-II Pro Development System which contains a Virtex II pro XC2VP30 FPGA chip with 2 powerPC 405 cores. Testing for single target detection and multiple target detection shows the proposed system is able to accurately locate targets under reverberation-limited underwater environment with a Signal-Noise-Ratio of up to -30db, with location error less than 10 meters and velocity estimation error less than 1 knot. In the proposed system the combination of CWT and local optimum search algorithm significantly saves the computation time for CWT and make it more practical to real applications. Also the implementation of ANFIS on the FPGA board indicates in the future a real-time ANFIS operation with VLSI implementation would be possible.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore