2,327 research outputs found
Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function
This paper addresses the problems of blind channel identification and
multichannel equalization for speech dereverberation and noise reduction. The
time-domain cross-relation method is not suitable for blind room impulse
response identification, due to the near-common zeros of the long impulse
responses. We extend the cross-relation method to the short-time Fourier
transform (STFT) domain, in which the time-domain impulse responses are
approximately represented by the convolutive transfer functions (CTFs) with
much less coefficients. The CTFs suffer from the common zeros caused by the
oversampled STFT. We propose to identify CTFs based on the STFT with the
oversampled signals and the critical sampled CTFs, which is a good compromise
between the frequency aliasing of the signals and the common zeros problem of
CTFs. In addition, a normalization of the CTFs is proposed to remove the gain
ambiguity across sub-bands. In the STFT domain, the identified CTFs is used for
multichannel equalization, in which the sparsity of speech signals is
exploited. We propose to perform inverse filtering by minimizing the
-norm of the source signal with the relaxed -norm fitting error
between the micophone signals and the convolution of the estimated source
signal and the CTFs used as a constraint. This method is advantageous in that
the noise can be reduced by relaxing the -norm to a tolerance
corresponding to the noise power, and the tolerance can be automatically set.
The experiments confirm the efficiency of the proposed method even under
conditions with high reverberation levels and intense noise.Comment: 13 pages, 5 figures, 5 table
System Identification with Applications in Speech Enhancement
As the increasing popularity of integrating hands-free telephony on mobile portable devices
and the rapid development of voice over internet protocol, identification of acoustic
systems has become desirable for compensating distortions introduced to speech signals
during transmission, and hence enhancing the speech quality. The objective of this research
is to develop system identification algorithms for speech enhancement applications
including network echo cancellation and speech dereverberation.
A supervised adaptive algorithm for sparse system identification is developed for
network echo cancellation. Based on the framework of selective-tap updating scheme
on the normalized least mean squares algorithm, the MMax and sparse partial update
tap-selection strategies are exploited in the frequency domain to achieve fast convergence
performance with low computational complexity. Through demonstrating how
the sparseness of the network impulse response varies in the transformed domain, the
multidelay filtering structure is incorporated to reduce the algorithmic delay.
Blind identification of SIMO acoustic systems for speech dereverberation in the
presence of common zeros is then investigated. First, the problem of common zeros is
defined and extended to include the presence of near-common zeros. Two clustering algorithms
are developed to quantify the number of these zeros so as to facilitate the study
of their effect on blind system identification and speech dereverberation. To mitigate such
effect, two algorithms are developed where the two-stage algorithm based on channel
decomposition identifies common and non-common zeros sequentially; and the forced
spectral diversity approach combines spectral shaping filters and channel undermodelling
for deriving a modified system that leads to an improved dereverberation performance.
Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased
dereverberation techniques. Comprehensive simulations and discussions demonstrate
the effectiveness of the aforementioned algorithms. A discussion on possible directions
of prospective research on system identification techniques concludes this thesis
Adaptive interference suppression for DS-CDMA systems based on interpolated FIR filters with adaptive interpolators in multipath channels
In this work we propose an adaptive linear receiver structure based on interpolated finite impulse response (FIR) filters with adaptive interpolators for direct sequence code division multiple access (DS-CDMA) systems in multipath channels. The interpolated minimum mean-squared error (MMSE) and the interpolated constrained minimum variance (CMV) solutions are described for a novel scheme where the interpolator is rendered time-varying in order to mitigate multiple access interference (MAI) and multiple-path propagation effects. Based upon the interpolated MMSE and CMV solutions we present computationally efficient stochastic gradient (SG) and exponentially weighted recursive least squares type (RLS) algorithms for both receiver and interpolator filters in the supervised and blind modes of operation. A convergence analysis of the algorithms and a discussion of the convergence properties of the method are carried out for both modes of operation. Simulation experiments for a downlink scenario show that the proposed structures achieve a superior BER convergence and steady-state performance to previously reported reduced-rank receivers at lower complexity
Multiuser MIMO-OFDM for Next-Generation Wireless Systems
This overview portrays the 40-year evolution of orthogonal frequency division multiplexing (OFDM) research. The amelioration of powerful multicarrier OFDM arrangements with multiple-input multiple-output (MIMO) systems has numerous benefits, which are detailed in this treatise. We continue by highlighting the limitations of conventional detection and channel estimation techniques designed for multiuser MIMO OFDM systems in the so-called rank-deficient scenarios, where the number of users supported or the number of transmit antennas employed exceeds the number of receiver antennas. This is often encountered in practice, unless we limit the number of users granted access in the base stationβs or radio portβs coverage area. Following a historical perspective on the associated design problems and their state-of-the-art solutions, the second half of this treatise details a range of classic multiuser detectors (MUDs) designed for MIMO-OFDM systems and characterizes their achievable performance. A further section aims for identifying novel cutting-edge genetic algorithm (GA)-aided detector solutions, which have found numerous applications in wireless communications in recent years. In an effort to stimulate the cross pollination of ideas across the machine learning, optimization, signal processing, and wireless communications research communities, we will review the broadly applicable principles of various GA-assisted optimization techniques, which were recently proposed also for employment inmultiuser MIMO OFDM. In order to stimulate new research, we demonstrate that the family of GA-aided MUDs is capable of achieving a near-optimum performance at the cost of a significantly lower computational complexity than that imposed by their optimum maximum-likelihood (ML) MUD aided counterparts. The paper is concluded by outlining a range of future research options that may find their way into next-generation wireless systems
An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony
In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end userβs speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
- β¦