1,013 research outputs found
An adaptive stereo basis method for convolutive blind audio source separation
NOTICE: this is the authorâs version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in PUBLICATION, [71, 10-12, June 2008] DOI:neucom.2007.08.02
Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework
This paper proposes a novel joint multi-speaker tracking-and-separation
method based on the generalized labeled multi-Bernoulli (GLMB) multi-target
tracking filter, using sound mixtures recorded by microphones. Standard
multi-speaker tracking algorithms usually only track speaker locations, and
ambiguity occurs when speakers are spatially close. The proposed multi-feature
GLMB tracking filter treats the set of vectors of associated speaker features
(location, pitch and sound) as the multi-target multi-feature observation,
characterizes transitioning features with corresponding transition models and
overall likelihood function, thus jointly tracks and separates each
multi-feature speaker, and addresses the spatial ambiguity problem. Numerical
evaluation verifies that the proposed method can correctly track locations of
multiple speakers and meanwhile separate speech signals
Pyroomacoustics: A Python package for audio room simulations and array processing algorithms
We present pyroomacoustics, a software package aimed at the rapid development
and testing of audio array processing algorithms. The content of the package
can be divided into three main components: an intuitive Python object-oriented
interface to quickly construct different simulation scenarios involving
multiple sound sources and microphones in 2D and 3D rooms; a fast C
implementation of the image source model for general polyhedral rooms to
efficiently generate room impulse responses and simulate the propagation
between sources and receivers; and finally, reference implementations of
popular algorithms for beamforming, direction finding, and adaptive filtering.
Together, they form a package with the potential to speed up the time to market
of new algorithms by significantly reducing the implementation overhead in the
performance evaluation step.Comment: 5 pages, 5 figures, describes a software packag
Efficient Interferer Cancelation based on Geometrical Information of the Reverberant Environment
reserved7ISSN (online) 2219-5491
5 pagg totaliPagani, P.; Riva, D.; Antonacci, F.; Prandi, G.; Tagliasacchi, M.; Sarti, A.; Tubaro, S.Pagani, P.; Riva, Davide; Antonacci, Fabio; Prandi, Giorgio; Tagliasacchi, Marco; Sarti, Augusto; Tubaro, Stefan
Uplink beamforming for the FDD mode of UTRA
This paper presents some link level simulation results for the evaluation of adaptive antennas in the uplink of the FDD mode of UTRA (UMTS terrestrial radio access). Two families of algorithms were initially considered, the basic difference between them being their ability/disability to suppress the contribution from W-CDMA directional interfering sources. Two distinct schemes were established as representatives for each family and their performance was evaluated in presence of some illustrative interfering scenarios. In the light of the results it is shown that time-reference beamforming algorithms suffer from severe beam pattern distortion effects when applied as such. This in turn causes harsh performance degradation in terms of raw BER, especially at high SINR levels. It is shown that these shortcomings are essentially caused by the uplink multiplexing of the traffic channel, which is seen by the base station as a powerful interfering source coming from the direction of arrival of the desired user.Peer ReviewedPostprint (published version
Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function
This paper addresses the problem of speech separation and enhancement from
multichannel convolutive and noisy mixtures, \emph{assuming known mixing
filters}. We propose to perform the speech separation and enhancement task in
the short-time Fourier transform domain, using the convolutive transfer
function (CTF) approximation. Compared to time-domain filters, CTF has much
less taps, consequently it has less near-common zeros among channels and less
computational complexity. The work proposes three speech-source recovery
methods, namely: i) the multichannel inverse filtering method, i.e. the
multiple input/output inverse theorem (MINT), is exploited in the CTF domain,
and for the multi-source case, ii) a beamforming-like multichannel inverse
filtering method applying single source MINT and using power minimization,
which is suitable whenever the source CTFs are not all known, and iii) a
constrained Lasso method, where the sources are recovered by minimizing the
-norm to impose their spectral sparsity, with the constraint that the
-norm fitting cost, between the microphone signals and the mixing model
involving the unknown source signals, is less than a tolerance. The noise can
be reduced by setting a tolerance onto the noise power. Experiments under
various acoustic conditions are carried out to evaluate the three proposed
methods. The comparison between them as well as with the baseline methods is
presented.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Spatio-Temporal processing for Optimum Uplink-Downlink WCDMA Systems
The capacity of a cellular system is limited by two different phenomena, namely
multipath fading and multiple access interference (MAl). A Two Dimensional (2-D)
receiver combats both of these by processing the signal both in the spatial and temporal
domain. An ideal 2-D receiver would perform joint space-time processing, but at the
price of high computational complexity. In this research we investigate computationally
simpler technique termed as a Beamfom1er-Rake. In a Beamformer-Rake, the output of a
beamfom1er is fed into a succeeding temporal processor to take advantage of both the
beamformer and Rake receiver. Wireless service providers throughout the world are
working to introduce the third generation (3G) and beyond (3G) cellular service that will
provide higher data rates and better spectral efficiency. Wideband COMA (WCDMA)
has been widely accepted as one of the air interfaces for 3G. A Beamformer-Rake
receiver can be an effective solution to provide the receivers enhanced capabilities
needed to achieve the required performance of a WCDMA system.
We consider three different Pilot Symbol Assisted (PSA) beamforming techniques,
Direct Matrix Inversion (DMI), Least-Mean Square (LMS) and Recursive Least Square
(RLS) adaptive algorithms. Geometrically Based Single Bounce (GBSB) statistical
Circular channel model is considered, which is more suitable for array processing, and
conductive to RAKE combining. The performances of the Beam former-Rake receiver are
evaluated in this channel model as a function of the number of antenna elements and
RAKE fingers, in which are evaluated for the uplink WCDMA system. It is shown that,
the Beamformer-Rake receiver outperforms the conventional RAKE receiver and the
conventional beamformer by a significant margin. Also, we optimize and develop a
mathematical formulation for the output Signal to Interference plus Noise Ratio (SINR)
of a Beam former-Rake receiver.
In this research, also, we develop, simulate and evaluate the SINR and Signal to Noise
Ratio (Et!Nol performances of an adaptive beamforming technique in the WCDMA
system for downlink. The performance is then compared with an omnidirectional antenna
system. Simulation shows that the best perfom1ance can be achieved when all the mobiles
with same Angle-of-Arrival (AOA) and different distance from base station are formed in
one beam
Generalized DOA and Source Number Estimation Techniques for Acoustics and Radar
The purpose of this thesis is to emphasize the lacking areas in the field of direction of arrival estimation and to propose building blocks for continued solution development in the area. A review of current methods are discussed and their pitfalls are emphasized. DOA estimators are compared to each other for usage on a conformal microphone array which receives impulsive, wideband signals. Further, many DOA estimators rely on the number of source signals prior to DOA estimation. Though techniques exist to achieve this, they lack robustness to estimate for certain signal types, particularly in the case where multiple radar targets exist in the same range bin. A deep neural network approach is proposed and evaluated for this particular case. The studies detailed in this thesis are specific to acoustic and radar applications for DOA estimation
- âŠ