Search CORE

1,114 research outputs found

Forest Density Estimation

Author: Gu Haijie
Gupta Anupam
Lafferty John
Liu Han
Wasserman Larry
Xu Min
Publication venue
Publication date: 01/01/2010
Field of study

We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation

arXiv.org e-Print Archive

CiteSeerX

Learning Latent Tree Graphical Models

Author: Anandkumar Animashree
Choi Myung Jin
Tan Vincent Y. F.
Willsky Alan S.
Publication venue
Publication date: 14/09/2010
Field of study

We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Caltech Authors

Latent tree models

Author: Zwiernik Piotr
Publication venue
Publication date: 02/08/2017
Field of study

Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent class model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, and data clustering. They also contain other well-known classes of models like hidden Markov models, Brownian motion tree model, the Ising model on a tree, and many popular models used in phylogenetics. This article offers a concise introduction to the theory of latent tree models. We emphasise the role of tree metrics in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned

arXiv.org e-Print Archive

Crossref

Inferring differentiation pathways from gene expression

Author: Akashi
Alexander Schliep
Anisimov
Ashburner
Banfield
Bar-Joseph
Bar-Joseph
Beerenwinkel
Beissbarth
Benjamini
Brunet
Carlin
Celeux
Chaudhuri
Chow
Christoph Hafemeister
Cormen
Costa
Costa
Cover
Dempster
Eisen
Ernst
Ferrari
Fraley
Friedman
Hubbert
Huber
Hyatt
Ivan G. Costa
Kanehisa
Kaufman
Lauritzen
Lauritzen
Matthias
McLachlan
McQueen
Meila
Ng
Niederberger
Poirot
Qiu
Rothenberg
Schaefer
Schönhuth
Steel
Stefan Roepcke
Tanay
Thiesson
Tomancak
Tze
Vesanto
Yamagata
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: The regulation of proliferation and differentiation of embryonic and adult stem cells into mature cells is central to developmental biology. Gene expression measured in distinguishable developmental stages helps to elucidate underlying molecular processes. In previous work we showed that functional gene modules, which act distinctly in the course of development, can be represented by a mixture of trees. In general, the similarities in the gene expression programs of cell populations reflect the similarities in the differentiation path