10,120 research outputs found
Markov Network Structure Learning via Ensemble-of-Forests Models
Real world systems typically feature a variety of different dependency types
and topologies that complicate model selection for probabilistic graphical
models. We introduce the ensemble-of-forests model, a generalization of the
ensemble-of-trees model. Our model enables structure learning of Markov random
fields (MRF) with multiple connected components and arbitrary potentials. We
present two approximate inference techniques for this model and demonstrate
their performance on synthetic data. Our results suggest that the
ensemble-of-forests approach can accurately recover sparse, possibly
disconnected MRF topologies, even in presence of non-Gaussian dependencies
and/or low sample size. We applied the ensemble-of-forests model to learn the
structure of perturbed signaling networks of immune cells and found that these
frequently exhibit non-Gaussian dependencies with disconnected MRF topologies.
In summary, we expect that the ensemble-of-forests model will enable MRF
structure learning in other high dimensional real world settings that are
governed by non-trivial dependencies.Comment: 13 pages, 6 figure
Tree cumulants and the geometry of binary tree models
In this paper we investigate undirected discrete graphical tree models when
all the variables in the system are binary, where leaves represent the
observable variables and where all the inner nodes are unobserved. A novel
approach based on the theory of partially ordered sets allows us to obtain a
convenient parametrization of this model class. The construction of the
proposed coordinate system mirrors the combinatorial definition of cumulants. A
simple product-like form of the resulting parametrization gives insight into
identifiability issues associated with this model class. In particular, we
provide necessary and sufficient conditions for such a model to be identified
up to the switching of labels of the inner nodes. When these conditions hold,
we give explicit formulas for the parameters of the model. Whenever the model
fails to be identified, we use the new parametrization to describe the geometry
of the unidentified parameter space. We illustrate these results using a simple
example.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ338 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Forest Density Estimation
We study graph estimation and density estimation in high dimensions, using a
family of density estimators based on forest structured undirected graphical
models. For density estimation, we do not assume the true distribution
corresponds to a forest; rather, we form kernel density estimates of the
bivariate and univariate marginals, and apply Kruskal's algorithm to estimate
the optimal forest on held out data. We prove an oracle inequality on the
excess risk of the resulting estimator relative to the risk of the best forest.
For graph estimation, we consider the problem of estimating forests with
restricted tree sizes. We prove that finding a maximum weight spanning forest
with restricted tree size is NP-hard, and develop an approximation algorithm
for this problem. Viewing the tree size as a complexity parameter, we then
select a forest using data splitting, and prove bounds on excess risk and
structure selection consistency of the procedure. Experiments with simulated
data and microarray data indicate that the methods are a practical alternative
to Gaussian graphical models.Comment: Extended version of earlier paper titled "Tree density estimation
Overlap Removal of Dimensionality Reduction Scatterplot Layouts
Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous
visualization tool for analyzing multidimensional data items with presence in
different areas. Despite its popularity, scatterplots suffer from occlusion,
especially when markers convey information, making it troublesome for users to
estimate items' groups' sizes and, more importantly, potentially obfuscating
critical items for the analysis under execution. Different strategies have been
devised to address this issue, either producing overlap-free layouts, lacking
the powerful capabilities of contemporary DR techniques in uncover interesting
data patterns, or eliminating overlaps as a post-processing strategy. Despite
the good results of post-processing techniques, the best methods typically
expand or distort the scatterplot area, thus reducing markers' size (sometimes)
to unreadable dimensions, defeating the purpose of removing overlaps. This
paper presents a novel post-processing strategy to remove DR layouts' overlaps
that faithfully preserves the original layout's characteristics and markers'
sizes. We show that the proposed strategy surpasses the state-of-the-art in
overlap removal through an extensive comparative evaluation considering
multiple different metrics while it is 2 or 3 orders of magnitude faster for
large datasets.Comment: 11 pages and 9 figure
Tridiagonalized GUE matrices are a matrix model for labeled mobiles
It is well-known that the number of planar maps with prescribed vertex degree
distribution and suitable labeling can be represented as the leading
coefficient of the -expansion of a joint cumulant of traces of
powers of an -by- GUE matrix. Here we undertake the calculation of this
leading coefficient in a different way. Firstly, we tridiagonalize the GUE
matrix in the manner of Trotter and Dumitriu-Edelman and then alter it by
conjugation to make the subdiagonal identically equal to . Secondly, we
apply the cluster expansion technique (specifically, the
Brydges-Kennedy-Abdesselam-Rivasseau formula) from rigorous statistical
mechanics. Thirdly, by sorting through the terms of the expansion thus
generated we arrive at an alternate interpretation for the leading coefficient
related to factorizations of the long cycle . Finally, we
reconcile the group-theoretical objects emerging from our calculation with the
labeled mobiles of Bouttier-Di Francesco-Guitter.Comment: 42 pages, LaTeX, 17 figures. The present paper completely supercedes
arXiv1203.3185 in terms of methods but addresses a different proble
- …