2,779 research outputs found
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Interleaved Factorial Non-Homogeneous Hidden Markov Models for Energy Disaggregation
To reduce energy demand in households it is useful to know which electrical
appliances are in use at what times. Monitoring individual appliances is costly
and intrusive, whereas data on overall household electricity use is more easily
obtained. In this paper, we consider the energy disaggregation problem where a
household's electricity consumption is disaggregated into the component
appliances. The factorial hidden Markov model (FHMM) is a natural model to fit
this data. We enhance this generic model by introducing two constraints on the
state sequence of the FHMM. The first is to use a non-homogeneous Markov chain,
modelling how appliance usage varies over the day, and the other is to enforce
that at most one chain changes state at each time step. This yields a new model
which we call the interleaved factorial non-homogeneous hidden Markov model
(IFNHMM). We evaluated the ability of this model to perform disaggregation in
an ultra-low frequency setting, over a data set of 251 English households. In
this new setting, the IFNHMM outperforms the FHMM in terms of recovering the
energy used by the component appliances, due to that stronger constraints have
been imposed on the states of the hidden Markov chains. Interestingly, we find
that the variability in model performance across households is significant,
underscoring the importance of using larger scale data in the disaggregation
problem.Comment: 5 pages, 1 figure, conference, The NIPS workshop on Machine Learning
for Sustainability, Lake Tahoe, NV, USA, 201
Metropolis Sampling
Monte Carlo (MC) sampling methods are widely applied in Bayesian inference,
system simulation and optimization problems. The Markov Chain Monte Carlo
(MCMC) algorithms are a well-known class of MC methods which generate a Markov
chain with the desired invariant distribution. In this document, we focus on
the Metropolis-Hastings (MH) sampler, which can be considered as the atom of
the MCMC techniques, introducing the basic notions and different properties. We
describe in details all the elements involved in the MH algorithm and the most
relevant variants. Several improvements and recent extensions proposed in the
literature are also briefly discussed, providing a quick but exhaustive
overview of the current Metropolis-based sampling's world.Comment: Wiley StatsRef-Statistics Reference Online, 201
- …