191 research outputs found
Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering
This paper addresses the problem of multichannel online dereverberation. The
proposed method is carried out in the short-time Fourier transform (STFT)
domain, and for each frequency band independently. In the STFT domain, the
time-domain room impulse response is approximately represented by the
convolutive transfer function (CTF). The multichannel CTFs are adaptively
identified based on the cross-relation method, and using the recursive least
square criterion. Instead of the complex-valued CTF convolution model, we use a
nonnegative convolution model between the STFT magnitude of the source signal
and the CTF magnitude, which is just a coarse approximation of the former
model, but is shown to be more robust against the CTF perturbations. Based on
this nonnegative model, we propose an online STFT magnitude inverse filtering
method. The inverse filters of the CTF magnitude are formulated based on the
multiple-input/output inverse theorem (MINT), and adaptively estimated based on
the gradient descent criterion. Finally, the inverse filtering is applied to
the STFT magnitude of the microphone signals, obtaining an estimate of the STFT
magnitude of the source signal. Experiments regarding both speech enhancement
and automatic speech recognition are conducted, which demonstrate that the
proposed method can effectively suppress reverberation, even for the difficult
case of a moving speaker.Comment: Paper submitted to IEEE/ACM Transactions on Audio, Speech and
Language Processing. IEEE Signal Processing Letters, 201
Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function
This paper addresses the problem of speech separation and enhancement from
multichannel convolutive and noisy mixtures, \emph{assuming known mixing
filters}. We propose to perform the speech separation and enhancement task in
the short-time Fourier transform domain, using the convolutive transfer
function (CTF) approximation. Compared to time-domain filters, CTF has much
less taps, consequently it has less near-common zeros among channels and less
computational complexity. The work proposes three speech-source recovery
methods, namely: i) the multichannel inverse filtering method, i.e. the
multiple input/output inverse theorem (MINT), is exploited in the CTF domain,
and for the multi-source case, ii) a beamforming-like multichannel inverse
filtering method applying single source MINT and using power minimization,
which is suitable whenever the source CTFs are not all known, and iii) a
constrained Lasso method, where the sources are recovered by minimizing the
-norm to impose their spectral sparsity, with the constraint that the
-norm fitting cost, between the microphone signals and the mixing model
involving the unknown source signals, is less than a tolerance. The noise can
be reduced by setting a tolerance onto the noise power. Experiments under
various acoustic conditions are carried out to evaluate the three proposed
methods. The comparison between them as well as with the baseline methods is
presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Single-Channel Speech Dereverberation using Subband Network with A Reverberation Time Shortening Target
This work proposes a subband network for single-channel speech
dereverberation, and also a new learning target based on reverberation time
shortening (RTS). In the time-frequency domain, we propose to use a subband
network to perform dereverberation for different frequency bands independently.
The time-domain convolution can be well decomposed to subband convolutions,
thence it is reasonable to train the subband network to perform subband
deconvolution. The learning target for dereverberation is usually set as the
direct-path speech or optionally with some early reflections. This type of
target suddenly truncates the reverberation, and thus it may not be suitable
for network training, and leads to a large prediction error. In this work, we
propose a RTS learning target to suppress reverberation and meanwhile maintain
the exponential decaying property of reverberation, which will ease the network
training, and thus reduce the prediction error and signal distortions.
Experiments show that the subband network can achieve outstanding
dereverberation performance, and the proposed target has a smaller prediction
error than the target of direct-path speech and early reflections.Comment: Submitted to INTERSPEECH 202
System Identification with Applications in Speech Enhancement
As the increasing popularity of integrating hands-free telephony on mobile portable devices
and the rapid development of voice over internet protocol, identification of acoustic
systems has become desirable for compensating distortions introduced to speech signals
during transmission, and hence enhancing the speech quality. The objective of this research
is to develop system identification algorithms for speech enhancement applications
including network echo cancellation and speech dereverberation.
A supervised adaptive algorithm for sparse system identification is developed for
network echo cancellation. Based on the framework of selective-tap updating scheme
on the normalized least mean squares algorithm, the MMax and sparse partial update
tap-selection strategies are exploited in the frequency domain to achieve fast convergence
performance with low computational complexity. Through demonstrating how
the sparseness of the network impulse response varies in the transformed domain, the
multidelay filtering structure is incorporated to reduce the algorithmic delay.
Blind identification of SIMO acoustic systems for speech dereverberation in the
presence of common zeros is then investigated. First, the problem of common zeros is
defined and extended to include the presence of near-common zeros. Two clustering algorithms
are developed to quantify the number of these zeros so as to facilitate the study
of their effect on blind system identification and speech dereverberation. To mitigate such
effect, two algorithms are developed where the two-stage algorithm based on channel
decomposition identifies common and non-common zeros sequentially; and the forced
spectral diversity approach combines spectral shaping filters and channel undermodelling
for deriving a modified system that leads to an improved dereverberation performance.
Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased
dereverberation techniques. Comprehensive simulations and discussions demonstrate
the effectiveness of the aforementioned algorithms. A discussion on possible directions
of prospective research on system identification techniques concludes this thesis
- …