1,529 research outputs found
A sticky HDP-HMM with application to speaker diarization
We consider the problem of speaker diarization, the problem of segmenting an
audio recording of a meeting into temporal segments corresponding to individual
speakers. The problem is rendered particularly difficult by the fact that we
are not allowed to assume knowledge of the number of people participating in
the meeting. To address this problem, we take a Bayesian nonparametric approach
to speaker diarization that builds on the hierarchical Dirichlet process hidden
Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006)
1566--1581]. Although the basic HDP-HMM tends to over-segment the audio
data---creating redundant states and rapidly switching among them---we describe
an augmented HDP-HMM that provides effective control over the switching rate.
We also show that this augmentation makes it possible to treat emission
distributions nonparametrically. To scale the resulting architecture to
realistic diarization problems, we develop a sampling algorithm that employs a
truncated approximation of the Dirichlet process to jointly resample the full
state sequence, greatly improving mixing rates. Working with a benchmark NIST
data set, we show that our Bayesian nonparametric architecture yields
state-of-the-art speaker diarization results.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Statistical Models of Reconstructed Phase Spaces for Signal Classification
This paper introduces a novel approach to the analysis and classification of time series signals using statistical models of reconstructed phase spaces. With sufficient dimension, such reconstructed phase spaces are, with probability one, guaranteed to be topologically equivalent to the state dynamics of the generating system, and, therefore, may contain information that is absent in analysis and classification methods rooted in linear assumptions. Parametric and nonparametric distributions are introduced as statistical representations over the multidimensional reconstructed phase space, with classification accomplished through methods such as Bayes maximum likelihood and artificial neural networks (ANNs). The technique is demonstrated on heart arrhythmia classification and speech recognition. This new approach is shown to be a viable and effective alternative to traditional signal classification approaches, particularly for signals with strong nonlinear characteristics
Bayesian Nonparametric Hidden Semi-Markov Models
There is much interest in the Hierarchical Dirichlet Process Hidden Markov
Model (HDP-HMM) as a natural Bayesian nonparametric extension of the ubiquitous
Hidden Markov Model for learning from sequential and time-series data. However,
in many settings the HDP-HMM's strict Markovian constraints are undesirable,
particularly if we wish to learn or encode non-geometric state durations. We
can extend the HDP-HMM to capture such structure by drawing upon
explicit-duration semi-Markovianity, which has been developed mainly in the
parametric frequentist setting, to allow construction of highly interpretable
models that admit natural prior information on state durations.
In this paper we introduce the explicit-duration Hierarchical Dirichlet
Process Hidden semi-Markov Model (HDP-HSMM) and develop sampling algorithms for
efficient posterior inference. The methods we introduce also provide new
methods for sampling inference in the finite Bayesian HSMM. Our modular Gibbs
sampling methods can be embedded in samplers for larger hierarchical Bayesian
models, adding semi-Markov chain modeling as another tool in the Bayesian
inference toolbox. We demonstrate the utility of the HDP-HSMM and our inference
methods on both synthetic and real experiments
A new class of wavelet networks for nonlinear system identification
A new class of wavelet networks (WNs) is proposed for nonlinear system identification. In the new networks, the model structure for a high-dimensional system is chosen to be a superimposition of a number of functions with fewer variables. By expanding each function using truncated wavelet decompositions, the multivariate nonlinear networks can be converted into linear-in-the-parameter regressions, which can be solved using least-squares type methods. An efficient model term selection approach based upon a forward orthogonal least squares (OLS) algorithm and the error reduction ratio (ERR) is applied to solve the linear-in-the-parameters problem in the present study. The main advantage of the new WN is that it exploits the attractive features of multiscale wavelet decompositions and the capability of traditional neural networks. By adopting the analysis of variance (ANOVA) expansion, WNs can now handle nonlinear identification problems in high dimensions
On Nonparametric Guidance for Learning Autoencoder Representations
Unsupervised discovery of latent representations, in addition to being useful
for density modeling, visualisation and exploratory data analysis, is also
increasingly important for learning features relevant to discriminative tasks.
Autoencoders, in particular, have proven to be an effective way to learn latent
codes that reflect meaningful variations in data. A continuing challenge,
however, is guiding an autoencoder toward representations that are useful for
particular tasks. A complementary challenge is to find codes that are invariant
to irrelevant transformations of the data. The most common way of introducing
such problem-specific guidance in autoencoders has been through the
incorporation of a parametric component that ties the latent representation to
the label information. In this work, we argue that a preferable approach relies
instead on a nonparametric guidance mechanism. Conceptually, it ensures that
there exists a function that can predict the label information, without
explicitly instantiating that function. The superiority of this guidance
mechanism is confirmed on two datasets. In particular, this approach is able to
incorporate invariance information (lighting, elevation, etc.) from the small
NORB object recognition dataset and yields state-of-the-art performance for a
single layer, non-convolutional network.Comment: 9 pages, 12 figure
- …