16,159 research outputs found

    Learning loopy graphical models with latent variables: Efficient methods and guarantees

    Get PDF
    The problem of structure estimation in graphical models with latent variables is considered. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider models where the underlying Markov graph is locally tree-like, and the model is in the regime of correlation decay. For the special case of the Ising model, the number of samples nn required for structural consistency of our method scales as n=Ω(θminδη(η+1)2logp)n=\Omega(\theta_{\min}^{-\delta\eta(\eta+1)-2}\log p), where p is the number of variables, θmin\theta_{\min} is the minimum edge potential, δ\delta is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η\eta is a parameter which depends on the bounds on node and edge potentials in the Ising model. Necessary conditions for structural consistency under any algorithm are derived and our method nearly matches the lower bound on sample requirements. Further, the proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Random Graph Generator for Bipartite Networks Modeling

    Full text link
    The purpose of this article is to introduce a new iterative algorithm with properties resembling real life bipartite graphs. The algorithm enables us to generate wide range of random bigraphs, which features are determined by a set of parameters.We adapt the advances of last decade in unipartite complex networks modeling to the bigraph setting. This data structure can be observed in several situations. However, only a few datasets are freely available to test the algorithms (e.g. community detection, influential nodes identification, information retrieval) which operate on such data. Therefore, artificial datasets are needed to enhance development and testing of the algorithms. We are particularly interested in applying the generator to the analysis of recommender systems. Therefore, we focus on two characteristics that, besides simple statistics, are in our opinion responsible for the performance of neighborhood based collaborative filtering algorithms. The features are node degree distribution and local clustering coeficient

    Topology Discovery of Sparse Random Graphs With Few Participants

    Get PDF
    We consider the task of topology discovery of sparse random graphs using end-to-end random measurements (e.g., delay) between a subset of nodes, referred to as the participants. The rest of the nodes are hidden, and do not provide any information for topology discovery. We consider topology discovery under two routing models: (a) the participants exchange messages along the shortest paths and obtain end-to-end measurements, and (b) additionally, the participants exchange messages along the second shortest path. For scenario (a), our proposed algorithm results in a sub-linear edit-distance guarantee using a sub-linear number of uniformly selected participants. For scenario (b), we obtain a much stronger result, and show that we can achieve consistent reconstruction when a sub-linear number of uniformly selected nodes participate. This implies that accurate discovery of sparse random graphs is tractable using an extremely small number of participants. We finally obtain a lower bound on the number of participants required by any algorithm to reconstruct the original random graph up to a given edit distance. We also demonstrate that while consistent discovery is tractable for sparse random graphs using a small number of participants, in general, there are graphs which cannot be discovered by any algorithm even with a significant number of participants, and with the availability of end-to-end information along all the paths between the participants.Comment: A shorter version appears in ACM SIGMETRICS 2011. This version is scheduled to appear in J. on Random Structures and Algorithm

    Spectral goodness of fit for network models

    Get PDF
    We introduce a new statistic, 'spectral goodness of fit' (SGOF) to measure how well a network model explains the structure of an observed network. SGOF provides an absolute measure of fit, analogous to the standard R-squared in linear regression. Additionally, as it takes advantage of the properties of the spectrum of the graph Laplacian, it is suitable for comparing network models of diverse functional forms, including both fitted statistical models and algorithmic generative models of networks. After introducing, defining, and providing guidance for interpreting SGOF, we illustrate the properties of the statistic with a number of examples and comparisons to existing techniques. We show that such a spectral approach to assessing model fit fills gaps left by earlier methods and can be widely applied

    Projective, Sparse, and Learnable Latent Position Network Models

    Full text link
    When modeling network data using a latent position model, it is typical to assume that the nodes' positions are independently and identically distributed. However, this assumption implies the average node degree grows linearly with the number of nodes, which is inappropriate when the graph is thought to be sparse. We propose an alternative assumption---that the latent positions are generated according to a Poisson point process---and show that it is compatible with various levels of sparsity. Unlike other notions of sparse latent position models in the literature, our framework also defines a projective sequence of probability models, thus ensuring consistency of statistical inference across networks of different sizes. We establish conditions for consistent estimation of the latent positions, and compare our results to existing frameworks for modeling sparse networks.Comment: 51 pages, 2 figure
    corecore