2 research outputs found
Learning High-Dimensional Mixtures of Graphical Models
We consider unsupervised estimation of mixtures of discrete graphical models,
where the class variable corresponding to the mixture components is hidden and
each mixture component over the observed variables can have a potentially
different Markov graph structure and parameters. We propose a novel approach
for estimating the mixture components, and our output is a tree-mixture model
which serves as a good approximation to the underlying graphical model mixture.
Our method is efficient when the union graph, which is the union of the Markov
graphs of the mixture components, has sparse vertex separators between any pair
of observed variables. This includes tree mixtures and mixtures of bounded
degree graphs. For such models, we prove that our method correctly recovers the
union graph structure and the tree structures corresponding to
maximum-likelihood tree approximations of the mixture components. The sample
and computational complexities of our method scale as \poly(p, r), for an
-component mixture of -variate graphical models. We further extend our
results to the case when the union graph has sparse local separators between
any pair of observed variables, such as mixtures of locally tree-like graphs,
and the mixture components are in the regime of correlation decay
Learning a Small Mixture of Trees ∗
The problem of approximating a given probability distribution using a simpler distribution plays an important role in several areas of machine learning, for example variational inference and classification. Within this context, we consider the task of learning a mixture of tree distributions. Although mixtures of trees can be learned by minimizing the KL-divergence using an EM algorithm, its success depends heavily on the initialization. We propose an efficient strategy for obtaining a good initial set of trees that attempts to cover the entire observed distribution by minimizing the α-divergence with α = ∞. We formulate the problem using the fractional covering framework and present a convergent sequential algorithm that only relies on solving a convex program at each iteration. Compared to previous methods, our approach results in a significantly smaller mixture of trees that provides similar or better accuracies. We demonstrate the usefulness of our approach by learning pictorial structures for face recognition.