2,989 research outputs found
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Discovering Clusters in Motion Time-Series Data
A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.Office of Naval Research (N000140310108, N000140110444); National Science Foundation (IIS-0208876, CAREER Award 0133825
An Open Source C++ Implementation of Multi-Threaded Gaussian Mixture Models, k-Means and Expectation Maximisation
Modelling of multivariate densities is a core component in many signal
processing, pattern recognition and machine learning applications. The
modelling is often done via Gaussian mixture models (GMMs), which use
computationally expensive and potentially unstable training algorithms. We
provide an overview of a fast and robust implementation of GMMs in the C++
language, employing multi-threaded versions of the Expectation Maximisation
(EM) and k-means training algorithms. Multi-threading is achieved through
reformulation of the EM and k-means algorithms into a MapReduce-like framework.
Furthermore, the implementation uses several techniques to improve numerical
stability and modelling accuracy. We demonstrate that the multi-threaded
implementation achieves a speedup of an order of magnitude on a recent 16 core
machine, and that it can achieve higher modelling accuracy than a previously
well-established publically accessible implementation. The multi-threaded
implementation is included as a user-friendly class in recent releases of the
open source Armadillo C++ linear algebra library. The library is provided under
the permissive Apache~2.0 license, allowing unencumbered use in commercial
products
Multi-candidate missing data imputation for robust speech recognition
The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of accuracy when compared to frontend MDT. © 2012 Wang and Van hamme; licensee Springer.Wang Y., Van hamme H., ''Multi-candidate missing data imputation for robust speech recognition'', EURASIP journal on audio, speech, and music processing, vol. 17, 20 pp., 2012.status: publishe
- …