8,483 research outputs found
Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function
This paper addresses the problem of speech separation and enhancement from
multichannel convolutive and noisy mixtures, \emph{assuming known mixing
filters}. We propose to perform the speech separation and enhancement task in
the short-time Fourier transform domain, using the convolutive transfer
function (CTF) approximation. Compared to time-domain filters, CTF has much
less taps, consequently it has less near-common zeros among channels and less
computational complexity. The work proposes three speech-source recovery
methods, namely: i) the multichannel inverse filtering method, i.e. the
multiple input/output inverse theorem (MINT), is exploited in the CTF domain,
and for the multi-source case, ii) a beamforming-like multichannel inverse
filtering method applying single source MINT and using power minimization,
which is suitable whenever the source CTFs are not all known, and iii) a
constrained Lasso method, where the sources are recovered by minimizing the
-norm to impose their spectral sparsity, with the constraint that the
-norm fitting cost, between the microphone signals and the mixing model
involving the unknown source signals, is less than a tolerance. The noise can
be reduced by setting a tolerance onto the noise power. Experiments under
various acoustic conditions are carried out to evaluate the three proposed
methods. The comparison between them as well as with the baseline methods is
presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
- …