28,045 research outputs found

    Denoising Deep Neural Networks Based Voice Activity Detection

    Full text link
    Recently, the deep-belief-networks (DBN) based voice activity detection (VAD) has been proposed. It is powerful in fusing the advantages of multiple features, and achieves the state-of-the-art performance. However, the deep layers of the DBN-based VAD do not show an apparent superiority to the shallower layers. In this paper, we propose a denoising-deep-neural-network (DDNN) based VAD to address the aforementioned problem. Specifically, we pre-train a deep neural network in a special unsupervised denoising greedy layer-wise mode, and then fine-tune the whole network in a supervised way by the common back-propagation algorithm. In the pre-training phase, we take the noisy speech signals as the visible layer and try to extract a new feature that minimizes the reconstruction cross-entropy loss between the noisy speech signals and its corresponding clean speech signals. Experimental results show that the proposed DDNN-based VAD not only outperforms the DBN-based VAD but also shows an apparent performance improvement of the deep layers over shallower layers.Comment: This paper has been accepted by IEEE ICASSP-2013, and will be published online after May, 201

    Sequential joint signal detection and signal-to-noise ratio estimation

    Full text link
    The sequential analysis of the problem of joint signal detection and signal-to-noise ratio (SNR) estimation for a linear Gaussian observation model is considered. The problem is posed as an optimization setup where the goal is to minimize the number of samples required to achieve the desired (i) type I and type II error probabilities and (ii) mean squared error performance. This optimization problem is reduced to a more tractable formulation by transforming the observed signal and noise sequences to a single sequence of Bernoulli random variables; joint detection and estimation is then performed on the Bernoulli sequence. This transformation renders the problem easily solvable, and results in a computationally simpler sufficient statistic compared to the one based on the (untransformed) observation sequences. Experimental results demonstrate the advantages of the proposed method, making it feasible for applications having strict constraints on data storage and computation.Comment: 5 pages, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 201

    Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

    Get PDF
    We address the problem of online localization and tracking of multiple moving speakers in reverberant environments. The paper has the following contributions. We use the direct-path relative transfer function (DP-RTF), an inter-channel feature that encodes acoustic information robust against reverberation, and we propose an online algorithm well suited for estimating DP-RTFs associated with moving audio sources. Another crucial ingredient of the proposed method is its ability to properly assign DP-RTFs to audio-source directions. Towards this goal, we adopt a maximum-likelihood formulation and we propose to use an exponentiated gradient (EG) to efficiently update source-direction estimates starting from their currently available values. The problem of multiple speaker tracking is computationally intractable because the number of possible associations between observed source directions and physical speakers grows exponentially with time. We adopt a Bayesian framework and we propose a variational approximation of the posterior filtering distribution associated with multiple speaker tracking, as well as an efficient variational expectation-maximization (VEM) solver. The proposed online localization and tracking method is thoroughly evaluated using two datasets that contain recordings performed in real environments.Comment: IEEE Journal of Selected Topics in Signal Processing, 201
    corecore