2,779 research outputs found

    A sticky HDP-HMM with application to speaker diarization

    Get PDF
    We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Interleaved Factorial Non-Homogeneous Hidden Markov Models for Energy Disaggregation

    Full text link
    To reduce energy demand in households it is useful to know which electrical appliances are in use at what times. Monitoring individual appliances is costly and intrusive, whereas data on overall household electricity use is more easily obtained. In this paper, we consider the energy disaggregation problem where a household's electricity consumption is disaggregated into the component appliances. The factorial hidden Markov model (FHMM) is a natural model to fit this data. We enhance this generic model by introducing two constraints on the state sequence of the FHMM. The first is to use a non-homogeneous Markov chain, modelling how appliance usage varies over the day, and the other is to enforce that at most one chain changes state at each time step. This yields a new model which we call the interleaved factorial non-homogeneous hidden Markov model (IFNHMM). We evaluated the ability of this model to perform disaggregation in an ultra-low frequency setting, over a data set of 251 English households. In this new setting, the IFNHMM outperforms the FHMM in terms of recovering the energy used by the component appliances, due to that stronger constraints have been imposed on the states of the hidden Markov chains. Interestingly, we find that the variability in model performance across households is significant, underscoring the importance of using larger scale data in the disaggregation problem.Comment: 5 pages, 1 figure, conference, The NIPS workshop on Machine Learning for Sustainability, Lake Tahoe, NV, USA, 201

    Metropolis Sampling

    Full text link
    Monte Carlo (MC) sampling methods are widely applied in Bayesian inference, system simulation and optimization problems. The Markov Chain Monte Carlo (MCMC) algorithms are a well-known class of MC methods which generate a Markov chain with the desired invariant distribution. In this document, we focus on the Metropolis-Hastings (MH) sampler, which can be considered as the atom of the MCMC techniques, introducing the basic notions and different properties. We describe in details all the elements involved in the MH algorithm and the most relevant variants. Several improvements and recent extensions proposed in the literature are also briefly discussed, providing a quick but exhaustive overview of the current Metropolis-based sampling's world.Comment: Wiley StatsRef-Statistics Reference Online, 201
    • …
    corecore