Search CORE

11,219 research outputs found

A Method of Moments for Mixture Models and Hidden Markov Models

Author: Anandkumar Animashree
Hsu Daniel
Kakade Sham M.
Publication venue
Publication date: 01/01/2012
Field of study

Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it offers a viable alternative to EM for practical deployment

arXiv.org e-Print Archive

CiteSeerX

Tensor decompositions for learning latent variable models

Author: Anandkumar Anima
Ge Rong
Hsu Daniel
Kakade Sham M.
Telgarsky Matus
Publication venue
Publication date: 01/08/2014
Field of study

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Caltech Authors

Geometric Learning of Hidden Markov Models via a Method of Moments Algorithm

Author: Chen Berlin
Mostajeran Cyrus
Said Salem
Publication venue: 'MDPI AG'
Publication date: 02/07/2022
Field of study

We present a novel algorithm for learning the parameters of hidden Markov models (HMMs) in a geometric setting where the observations take values in Riemannian manifolds. In particular, we elevate a recent second-order method of moments algorithm that incorporates non-consecutive correlations to a more general setting where observations take place in a Riemannian symmetric space of non-positive curvature and the observation likelihoods are Riemannian Gaussians. The resulting algorithm decouples into a Riemannian Gaussian mixture model estimation algorithm followed by a sequence of convex optimization procedures. We demonstrate through examples that the learner can result in significantly improved speed and numerical accuracy compared to existing learners

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Introduction to finite mixtures

Author: Green Peter J.
Publication venue
Publication date: 05/05/2018
Field of study

Mixture models have been around for over 150 years, as an intuitively simple and practical tool for enriching the collection of probability distributions available for modelling data. In this chapter we describe the basic ideas of the subject, present several alternative representations and perspectives on these models, and discuss some of the elements of inference about the unknowns in the models. Our focus is on the simplest set-up, of finite mixture models, but we discuss also how various simplifying assumptions can be relaxed to generate the rich landscape of modelling and inference ideas traversed in the rest of this book.Comment: 14 pages, 7 figures, A chapter prepared for the forthcoming Handbook of Mixture Analysis. V2 corrects a small but important typographical error, and makes other minor edits; V3 makes further minor corrections and updates following review; V4 corrects algorithmic details in sec 4.1 and 4.2, and removes typo

arXiv.org e-Print Archive

Crossref

A Spectral Algorithm for Latent Dirichlet Allocation

Author: Anandkumar Animashree
Foster Dean P.
Hsu Daniel
Kakade Sham M.
Liu Yi-Kai
Publication venue
Publication date: 01/01/2012
Field of study

The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden. We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on

k\times k

matrices, where

k

is the number of latent factors (e.g. the number of topics), rather than in the

d

-dimensional observed space (typically

d \gg k

).Comment: Changed title to match conference version, which appears in Advances in Neural Information Processing Systems 25, 201

arXiv.org e-Print Archive

CiteSeerX

Bayesian Portfolio Selection in a Markov Switching Gaussian Mixture Model

Author: Qian Hang
Publication venue
Publication date
Field of study

Departure from normality poses implementation barriers to the Markowitz mean-variance portfolio selection. When assets are affected by common and idiosyncratic shocks, the distribution of asset returns may exhibit Markov switching regimes and have a Gaussian mixture distribution conditional on each regime. The model is estimated in a Bayesian framework using the Gibbs sampler. An application to the global portfolio diversification is also discussed.Portfolio; Bayesian; Hidden Markov Model; Gaussian Mixture

Research Papers in Economics

Spectral Sequence Motif Discovery

Author: Colombo Nicolò
Vlassis Nikos
Publication venue
Publication date: 01/01/2014
Field of study

Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, motif finding algorithms of increasingly high performance are required to process the big datasets produced by new high-throughput sequencing technologies. Most existing algorithms are computationally demanding and often cannot support the large size of new experimental data. We present a new motif discovery algorithm that is built on a recent machine learning technique, referred to as Method of Moments. Based on spectral decompositions, this method is robust under model misspecification and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. In a few minutes, we can process datasets of hundreds of thousand sequences and extract motif profiles that match those computed by various state-of-the-art algorithms.Comment: 20 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Open Repository and Bibliography - Luxembourg