10,120 research outputs found

    Markov Network Structure Learning via Ensemble-of-Forests Models

    Full text link
    Real world systems typically feature a variety of different dependency types and topologies that complicate model selection for probabilistic graphical models. We introduce the ensemble-of-forests model, a generalization of the ensemble-of-trees model. Our model enables structure learning of Markov random fields (MRF) with multiple connected components and arbitrary potentials. We present two approximate inference techniques for this model and demonstrate their performance on synthetic data. Our results suggest that the ensemble-of-forests approach can accurately recover sparse, possibly disconnected MRF topologies, even in presence of non-Gaussian dependencies and/or low sample size. We applied the ensemble-of-forests model to learn the structure of perturbed signaling networks of immune cells and found that these frequently exhibit non-Gaussian dependencies with disconnected MRF topologies. In summary, we expect that the ensemble-of-forests model will enable MRF structure learning in other high dimensional real world settings that are governed by non-trivial dependencies.Comment: 13 pages, 6 figure

    Tree cumulants and the geometry of binary tree models

    Full text link
    In this paper we investigate undirected discrete graphical tree models when all the variables in the system are binary, where leaves represent the observable variables and where all the inner nodes are unobserved. A novel approach based on the theory of partially ordered sets allows us to obtain a convenient parametrization of this model class. The construction of the proposed coordinate system mirrors the combinatorial definition of cumulants. A simple product-like form of the resulting parametrization gives insight into identifiability issues associated with this model class. In particular, we provide necessary and sufficient conditions for such a model to be identified up to the switching of labels of the inner nodes. When these conditions hold, we give explicit formulas for the parameters of the model. Whenever the model fails to be identified, we use the new parametrization to describe the geometry of the unidentified parameter space. We illustrate these results using a simple example.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ338 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Forest Density Estimation

    Full text link
    We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation

    Overlap Removal of Dimensionality Reduction Scatterplot Layouts

    Full text link
    Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional data items with presence in different areas. Despite its popularity, scatterplots suffer from occlusion, especially when markers convey information, making it troublesome for users to estimate items' groups' sizes and, more importantly, potentially obfuscating critical items for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts, lacking the powerful capabilities of contemporary DR techniques in uncover interesting data patterns, or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, the best methods typically expand or distort the scatterplot area, thus reducing markers' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This paper presents a novel post-processing strategy to remove DR layouts' overlaps that faithfully preserves the original layout's characteristics and markers' sizes. We show that the proposed strategy surpasses the state-of-the-art in overlap removal through an extensive comparative evaluation considering multiple different metrics while it is 2 or 3 orders of magnitude faster for large datasets.Comment: 11 pages and 9 figure

    Tridiagonalized GUE matrices are a matrix model for labeled mobiles

    Full text link
    It is well-known that the number of planar maps with prescribed vertex degree distribution and suitable labeling can be represented as the leading coefficient of the 1N\frac{1}{N}-expansion of a joint cumulant of traces of powers of an NN-by-NN GUE matrix. Here we undertake the calculation of this leading coefficient in a different way. Firstly, we tridiagonalize the GUE matrix in the manner of Trotter and Dumitriu-Edelman and then alter it by conjugation to make the subdiagonal identically equal to 11. Secondly, we apply the cluster expansion technique (specifically, the Brydges-Kennedy-Abdesselam-Rivasseau formula) from rigorous statistical mechanics. Thirdly, by sorting through the terms of the expansion thus generated we arrive at an alternate interpretation for the leading coefficient related to factorizations of the long cycle (12⋯n)∈Sn(12\cdots n)\in S_n. Finally, we reconcile the group-theoretical objects emerging from our calculation with the labeled mobiles of Bouttier-Di Francesco-Guitter.Comment: 42 pages, LaTeX, 17 figures. The present paper completely supercedes arXiv1203.3185 in terms of methods but addresses a different proble
    • …
    corecore