1,109 research outputs found
A Fast Algorithm For Sparse Multichannel Blind Deconvolution
Coordenação de Aperfeiçoamento de Pessoal de NÃvel Superior (CAPES)Conselho Nacional de Desenvolvimento CientÃfico e Tecnológico (CNPq)We have addressed blind deconvolution in a multichannel framework. Recently, a robust solution to this problem based on a Bayesian approach called sparse multichannel blind deconvolution (SMBD) was proposed in the literature with interesting results. However, its computational complexity can be high. We have proposed a fast algorithm based on the minimum entropy deconvolution, which is considerably less expensive. We designed the deconvolution filter to minimize a normalized version of the hybrid l(1)/l(2)-norm loss function. This is in contrast to the SMBD, in which the hybrid l(1)/l(2)-norm function is used as a regularization term to directly determine the deconvolved signal. Results with synthetic data determined that the performance of the obtained deconvolution filter was similar to the one obtained in a supervised framework. Similar results were also obtained in a real marine data set for both techniques.811V7V16CAPESCNPqPetrobrasCoordenação de Aperfeiçoamento de Pessoal de NÃvel Superior (CAPES)Conselho Nacional de Desenvolvimento CientÃfico e Tecnológico (CNPq
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
Subspace Methods for Joint Sparse Recovery
We propose robust and efficient algorithms for the joint sparse recovery
problem in compressed sensing, which simultaneously recover the supports of
jointly sparse signals from their multiple measurement vectors obtained through
a common sensing matrix. In a favorable situation, the unknown matrix, which
consists of the jointly sparse signals, has linearly independent nonzero rows.
In this case, the MUSIC (MUltiple SIgnal Classification) algorithm, originally
proposed by Schmidt for the direction of arrival problem in sensor array
processing and later proposed and analyzed for joint sparse recovery by Feng
and Bresler, provides a guarantee with the minimum number of measurements. We
focus instead on the unfavorable but practically significant case of
rank-defect or ill-conditioning. This situation arises with limited number of
measurement vectors, or with highly correlated signal components. In this case
MUSIC fails, and in practice none of the existing methods can consistently
approach the fundamental limit. We propose subspace-augmented MUSIC (SA-MUSIC),
which improves on MUSIC so that the support is reliably recovered under such
unfavorable conditions. Combined with subspace-based greedy algorithms also
proposed and analyzed in this paper, SA-MUSIC provides a computationally
efficient algorithm with a performance guarantee. The performance guarantees
are given in terms of a version of restricted isometry property. In particular,
we also present a non-asymptotic perturbation analysis of the signal subspace
estimation that has been missing in the previous study of MUSIC.Comment: submitted to IEEE transactions on Information Theory, revised versio
Convolutive Blind Source Separation Methods
In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks
Sparse Nonlinear MIMO Filtering and Identification
In this chapter system identification algorithms for sparse nonlinear multi input multi output (MIMO) systems are developed. These algorithms are potentially useful in a variety of application areas including digital transmission systems incorporating power amplifier(s) along with multiple antennas, cognitive processing, adaptive control of nonlinear multivariable systems, and multivariable biological systems. Sparsity is a key constraint imposed on the model. The presence of sparsity is often dictated by physical considerations as in wireless fading channel-estimation. In other cases it appears as a pragmatic modelling approach that seeks to cope with the curse of dimensionality, particularly acute in nonlinear systems like Volterra type series. Three dentification approaches are discussed: conventional identification based on both input and output samples, semi–blind identification placing emphasis on minimal input resources and blind identification whereby only output samples are available plus a–priori information on input characteristics. Based on this taxonomy a variety of algorithms, existing and new, are studied and evaluated by simulation
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
- …