16,159 research outputs found
Learning loopy graphical models with latent variables: Efficient methods and guarantees
The problem of structure estimation in graphical models with latent variables
is considered. We characterize conditions for tractable graph estimation and
develop efficient methods with provable guarantees. We consider models where
the underlying Markov graph is locally tree-like, and the model is in the
regime of correlation decay. For the special case of the Ising model, the
number of samples required for structural consistency of our method scales
as , where p is the
number of variables, is the minimum edge potential, is
the depth (i.e., distance from a hidden node to the nearest observed nodes),
and is a parameter which depends on the bounds on node and edge
potentials in the Ising model. Necessary conditions for structural consistency
under any algorithm are derived and our method nearly matches the lower bound
on sample requirements. Further, the proposed method is practical to implement
and provides flexibility to control the number of latent variables and the
cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Random Graph Generator for Bipartite Networks Modeling
The purpose of this article is to introduce a new iterative algorithm with
properties resembling real life bipartite graphs. The algorithm enables us to
generate wide range of random bigraphs, which features are determined by a set
of parameters.We adapt the advances of last decade in unipartite complex
networks modeling to the bigraph setting. This data structure can be observed
in several situations. However, only a few datasets are freely available to
test the algorithms (e.g. community detection, influential nodes
identification, information retrieval) which operate on such data. Therefore,
artificial datasets are needed to enhance development and testing of the
algorithms. We are particularly interested in applying the generator to the
analysis of recommender systems. Therefore, we focus on two characteristics
that, besides simple statistics, are in our opinion responsible for the
performance of neighborhood based collaborative filtering algorithms. The
features are node degree distribution and local clustering coeficient
Topology Discovery of Sparse Random Graphs With Few Participants
We consider the task of topology discovery of sparse random graphs using
end-to-end random measurements (e.g., delay) between a subset of nodes,
referred to as the participants. The rest of the nodes are hidden, and do not
provide any information for topology discovery. We consider topology discovery
under two routing models: (a) the participants exchange messages along the
shortest paths and obtain end-to-end measurements, and (b) additionally, the
participants exchange messages along the second shortest path. For scenario
(a), our proposed algorithm results in a sub-linear edit-distance guarantee
using a sub-linear number of uniformly selected participants. For scenario (b),
we obtain a much stronger result, and show that we can achieve consistent
reconstruction when a sub-linear number of uniformly selected nodes
participate. This implies that accurate discovery of sparse random graphs is
tractable using an extremely small number of participants. We finally obtain a
lower bound on the number of participants required by any algorithm to
reconstruct the original random graph up to a given edit distance. We also
demonstrate that while consistent discovery is tractable for sparse random
graphs using a small number of participants, in general, there are graphs which
cannot be discovered by any algorithm even with a significant number of
participants, and with the availability of end-to-end information along all the
paths between the participants.Comment: A shorter version appears in ACM SIGMETRICS 2011. This version is
scheduled to appear in J. on Random Structures and Algorithm
Spectral goodness of fit for network models
We introduce a new statistic, 'spectral goodness of fit' (SGOF) to measure
how well a network model explains the structure of an observed network. SGOF
provides an absolute measure of fit, analogous to the standard R-squared in
linear regression. Additionally, as it takes advantage of the properties of the
spectrum of the graph Laplacian, it is suitable for comparing network models of
diverse functional forms, including both fitted statistical models and
algorithmic generative models of networks. After introducing, defining, and
providing guidance for interpreting SGOF, we illustrate the properties of the
statistic with a number of examples and comparisons to existing techniques. We
show that such a spectral approach to assessing model fit fills gaps left by
earlier methods and can be widely applied
Projective, Sparse, and Learnable Latent Position Network Models
When modeling network data using a latent position model, it is typical to
assume that the nodes' positions are independently and identically distributed.
However, this assumption implies the average node degree grows linearly with
the number of nodes, which is inappropriate when the graph is thought to be
sparse. We propose an alternative assumption---that the latent positions are
generated according to a Poisson point process---and show that it is compatible
with various levels of sparsity. Unlike other notions of sparse latent position
models in the literature, our framework also defines a projective sequence of
probability models, thus ensuring consistency of statistical inference across
networks of different sizes. We establish conditions for consistent estimation
of the latent positions, and compare our results to existing frameworks for
modeling sparse networks.Comment: 51 pages, 2 figure
- …