28,045 research outputs found
Denoising Deep Neural Networks Based Voice Activity Detection
Recently, the deep-belief-networks (DBN) based voice activity detection (VAD)
has been proposed. It is powerful in fusing the advantages of multiple
features, and achieves the state-of-the-art performance. However, the deep
layers of the DBN-based VAD do not show an apparent superiority to the
shallower layers. In this paper, we propose a denoising-deep-neural-network
(DDNN) based VAD to address the aforementioned problem. Specifically, we
pre-train a deep neural network in a special unsupervised denoising greedy
layer-wise mode, and then fine-tune the whole network in a supervised way by
the common back-propagation algorithm. In the pre-training phase, we take the
noisy speech signals as the visible layer and try to extract a new feature that
minimizes the reconstruction cross-entropy loss between the noisy speech
signals and its corresponding clean speech signals. Experimental results show
that the proposed DDNN-based VAD not only outperforms the DBN-based VAD but
also shows an apparent performance improvement of the deep layers over
shallower layers.Comment: This paper has been accepted by IEEE ICASSP-2013, and will be
published online after May, 201
Sequential joint signal detection and signal-to-noise ratio estimation
The sequential analysis of the problem of joint signal detection and
signal-to-noise ratio (SNR) estimation for a linear Gaussian observation model
is considered. The problem is posed as an optimization setup where the goal is
to minimize the number of samples required to achieve the desired (i) type I
and type II error probabilities and (ii) mean squared error performance. This
optimization problem is reduced to a more tractable formulation by transforming
the observed signal and noise sequences to a single sequence of Bernoulli
random variables; joint detection and estimation is then performed on the
Bernoulli sequence. This transformation renders the problem easily solvable,
and results in a computationally simpler sufficient statistic compared to the
one based on the (untransformed) observation sequences. Experimental results
demonstrate the advantages of the proposed method, making it feasible for
applications having strict constraints on data storage and computation.Comment: 5 pages, Proceedings of IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 201
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments
We address the problem of online localization and tracking of multiple moving
speakers in reverberant environments. The paper has the following
contributions. We use the direct-path relative transfer function (DP-RTF), an
inter-channel feature that encodes acoustic information robust against
reverberation, and we propose an online algorithm well suited for estimating
DP-RTFs associated with moving audio sources. Another crucial ingredient of the
proposed method is its ability to properly assign DP-RTFs to audio-source
directions. Towards this goal, we adopt a maximum-likelihood formulation and we
propose to use an exponentiated gradient (EG) to efficiently update
source-direction estimates starting from their currently available values. The
problem of multiple speaker tracking is computationally intractable because the
number of possible associations between observed source directions and physical
speakers grows exponentially with time. We adopt a Bayesian framework and we
propose a variational approximation of the posterior filtering distribution
associated with multiple speaker tracking, as well as an efficient variational
expectation-maximization (VEM) solver. The proposed online localization and
tracking method is thoroughly evaluated using two datasets that contain
recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
- …