13 research outputs found
A coupled HMM for solving the permutation problem in frequency domain BSS
Permutation of the outputs at different frequency bins
remains as a major problem in the convolutive blind source
separation (BSS). In this work a coupled Hidden Markov
model (CHMM) effectively exploits the psychoacoustic
characteristics of signals to mitigate such permutation. A
joint diagonalization algorithm for convolutive BSS, which
incorporates a non-unitary penalty term within the crosspower
spectrum-based cost function in the frequency
domain, has been used. The proposed CHMM system
couples a number of conventional HMMs, equivalent to the
number of outputs, by making state transitions in each
model dependent not only on its own previous state, but
also on some aspects of the state of the other models. Using
this method the permutation effect has been substantially
reduced, and demonstrated using a number of simulation
studies
Variable step-size sign natural gradient algorithm for sequential blind source separation
A novel variable step-size sign natural gradient algorithm (VS-S-NGA) for online blind separation of independent sources is presented. A sign operator for the adaptation of the separation model is obtained from the derivation of a generalized dynamic separation model. A variable step size is also derived to better match the dynamics of the input signals and unmixing matrix. The proposed sign algorithm is appealing in practice due to its computational simplicity. Experimental results verify the superior convergence performance over conventional NGA in both stationary and nonstationary environments
Adaptive signal processing techniques for clutter removal in radar-based navigation systems
The problem of background clutter remains as a major challenge in radar-based navigation, particularly due to its time-varying statistical properties. Adaptive solutions for clutter removal are therefore sought which meet the demanding convergence and accuracy requirements of the navigation application. In this paper, a new structure which combines blind source separation (BSS) and adaptive interference cancellation (AIC) is proposed to solve the problem more accurately without prior statistical knowledge of the sea clutter. The new algorithms are confirmed to outperform previously proposed adaptive schemes for such processing through simulation studies
A multiplicative algorithm for convolutive non-negative matrix factorization based on squared euclidean distance
Using the convolutive nonnegative matrix factorization (NMF)
model due to Smaragdis, we develop a novel algorithm for matrix decomposition
based on the squared Euclidean distance criterion. The algorithm
features new formally derived learning rules and an efficient update for
the reconstructed nonnegative matrix. Performance comparisons in terms
of computational load and audio onset detection accuracy indicate the advantage
of the Euclidean distance criterion over the Kullback–Leibler divergence
criterion
Penalty function-based joint diagonalization approach for convolutive blind separation of nonstationary sources
A new approach for convolutive blind source separation (BSS) by explicitly exploiting the second-order nonstationarity of signals and operating in the frequency domain is proposed. The algorithm accommodates a penalty function within the cross-power spectrum-based cost function and thereby converts the separation problem into a joint diagonalization problem with unconstrained optimization. This leads to a new member of the family of joint diagonalization criteria and a modification of the search direction of the gradient-based descent algorithm. Using this approach, not only can the degenerate solution induced by a unmixing matrix and the effect of large errors within the elements of covariance matrices at low-frequency bins be automatically removed, but in addition, a unifying view to joint diagonalization with unitary or nonunitary constraint is provided. Numerical experiments are presented to verify the performance of the new method, which show that a suitable penalty function may lead the algorithm to a faster convergence and a better performance for the separation of convolved speech signals, in particular, in terms of shape preservation and amplitude ambiguity reduction, as compared with the conventional second-order based algorithms for convolutive mixtures that exploit signal nonstationarity
Non-negative matrix factorization for note onset detection of audio signals
A novel approach using non-negative matrix factorization (NMF) for onset detection of musical notes from audio signals is presented. Unlike most commonly used conventional approaches, the proposed method exploits a new detection function constructed from the linear temporal bases that are obtained from a non-negative matrix decomposition of musical spectra. Both first-order difference and psychoacoustically motivated relative difference functions of the temporal profile are considered. As the approach works directly on input data, no prior knowledge or statistical information is thereby required. A practical issue of the choice of the factorization rank is also examined experimentally. Numerical examples are provided to show the performance of the proposed method
Blind separation of convolutive mixtures of cyclostationary sources using an extended natural gradient method
An on-line adaptive blind source separation algorithm for
the separation of convolutive mixtures of cyclostationary
source signals is proposed. The algorithm is derived by applying natural gradient iterative learning to the novel cost
function which is defined according to the wide sense cyclostationarity
of signals. The efficiency of the algorithm
is supported by simulations, which show that the proposed
algorithm has improved performance for the separation of
convolved cyclostationary signals in terms of convergence
speed and waveform similarity measurement, as compared
to the conventional natural gradient algorithm for convolutive
mixtures
Exploitation of source nonstationarity in underdetermined blind source separation with advanced clustering techniques
The problem of blind source separation (BSS) is
investigated. Following the assumption that the time-frequency
(TF) distributions of the input sources do not overlap, quadratic
TF representation is used to exploit the sparsity of the statistically
nonstationary sources. However, separation performance is shown
to be limited by the selection of a certain threshold in classifying
the eigenvectors of the TF matrices drawn from the observation
mixtures. Two methods are, therefore, proposed based on recently
introduced advanced clustering techniques, namely Gap statistics
and self-splitting competitive learning (SSCL), to mitigate the
problem of eigenvector classification. The novel integration of
these two approaches successfully overcomes the problem of artificial
sources induced by insufficient knowledge of the threshold and
enables automatic determination of the number of active sources
over the observation. The separation performance is thereby
greatly improved. Practical consequences of violating the TF orthogonality
assumption in the current approach are also studied,
which motivates the proposal of a new solution robust to violation
of orthogonality. In this new method, the TF plane is partitioned
into appropriate blocks and source separation is thereby carried
out in a block-by-block manner. Numerical experiments with
linear chirp signals and Gaussian minimum shift keying (GMSK)
signals are included which support the improved performance of
the proposed approaches
Non-Negative Matrix Factorization for Note Onset Detection of Audio Signals
A novel approach using non-negative matrix factorization (NMF) for onset detection of musical notes from audio signals is presented. Unlike most commonly used conventional approaches, the proposed method exploits a new detection function constructed from the linear temporal bases that are obtained from a non-negative matrix decomposition of musical spectra. Both first-order difference and psychoacoustically motivated relative difference functions of the temporal profile are considered. As the approach works directly on input data, no prior knowledge or statistical information is thereby required. A practical issue of the choice of the factorization rank is also examined experimentally. Numerical examples are provided to show the performance of the proposed method
Video-aided model-based source separation in real reverberant rooms
Source separation algorithms that utilize only audio
data can perform poorly if multiple sources or reverberation
are present. In this paper we therefore propose a video-aided
model-based source separation algorithm for a two-channel
reverberant recording in which the sources are assumed static.
By exploiting cues from video, we first localize individual speech
sources in the enclosure and then estimate their directions.
The interaural spatial cues, the interaural phase difference and
the interaural level difference, as well as the mixing vectors
are probabilistically modeled. The models make use of the
source direction information and are evaluated at discrete timefrequency
points. The model parameters are refined with the wellknown
expectation-maximization (EM) algorithm. The algorithm
outputs time-frequency masks that are used to reconstruct the
individual sources. Simulation results show that by utilizing the
visual modality the proposed algorithm can produce better timefrequency
masks thereby giving improved source estimates. We
provide experimental results to test the proposed algorithm in
different scenarios and provide comparisons with both other
audio-only and audio-visual algorithms and achieve improved
performance both on synthetic and real data. We also include
dereverberation based pre-processing in our algorithm in order
to suppress the late reverberant components from the observed
stereo mixture and further enhance the overall output of the algorithm.
This advantage makes our algorithm a suitable candidate
for use in under-determined highly reverberant settings where
the performance of other audio-only and audio-visual methods
is limited