151,529 research outputs found

    The mutual information between graphs

    Get PDF
    The estimation of mutual information between graphs has been an elusive problem until the formulation of graph matching in terms of manifold alignment. Then, graphs are mapped to multi-dimensional sets of points through structure preserving embeddings. Point-wise alignment algorithms can be exploited in this context to re-cast graph matching in terms of point matching. Methods based on bypass entropy estimation must be deployed to render the estimation of mutual information computationally tractable. In this paper the novel contribution is to show how manifold alignment can be combined with copula-based entropy estimators to efficiently estimate the mutual information between graphs. We compare the empirical copula with an Archimedean copula (the independent one) in terms of retrieval/recall after graph comparison. Our experiments show that mutual information built in both choices improves significantly state-of-the art divergences.Funding. F. Escolano, M.A. Lozano: Project TIN2012-32839 (Spanish Gov.). M. Curado: BES-2013-064482 (Spanish Gov.). E. R. Hancock: Royal Society Wolfson Research Merit Award

    Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    Full text link
    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

    Glauber Dynamics on Trees and Hyperbolic Graphs

    Get PDF
    We study continuous time Glauber dynamics for random configurations with local constraints (e.g. proper coloring, Ising and Potts models) on finite graphs with nn vertices and of bounded degree. We show that the relaxation time (defined as the reciprocal of the spectral gap ∣λ1−λ2∣|\lambda_1-\lambda_2|) for the dynamics on trees and on planar hyperbolic graphs, is polynomial in nn. For these hyperbolic graphs, this yields a general polynomial sampling algorithm for random configurations. We then show that if the relaxation time τ2\tau_2 satisfies τ2=O(1)\tau_2=O(1), then the correlation coefficient, and the mutual information, between any local function (which depends only on the configuration in a fixed window) and the boundary conditions, decays exponentially in the distance between the window and the boundary. For the Ising model on a regular tree, this condition is sharp.Comment: To appear in Probability Theory and Related Field

    Seeded Graph Matching: Efficient Algorithms and Theoretical Guarantees

    Full text link
    In this paper, a new information theoretic framework for graph matching is introduced. Using this framework, the graph isomorphism and seeded graph matching problems are studied. The maximum degree algorithm for graph isomorphism is analyzed and sufficient conditions for successful matching are rederived using type analysis. Furthermore, a new seeded matching algorithm with polynomial time complexity is introduced. The algorithm uses `typicality matching' and techniques from point-to-point communications for reliable matching. Assuming an Erdos-Renyi model on the correlated graph pair, it is shown that successful matching is guaranteed when the number of seeds grows logarithmically with the number of vertices in the graphs. The logarithmic coefficient is shown to be inversely proportional to the mutual information between the edge variables in the two graphs

    COIN: Co-Cluster Infomax for Bipartite Graphs

    Full text link
    Bipartite graphs are powerful data structures to model interactions between two types of nodes, which have been used in a variety of applications, such as recommender systems, information retrieval, and drug discovery. A fundamental challenge for bipartite graphs is how to learn informative node embeddings. Despite the success of recent self-supervised learning methods on bipartite graphs, their objectives are discriminating instance-wise positive and negative node pairs, which could contain cluster-level errors. In this paper, we introduce a novel co-cluster infomax (COIN) framework, which captures the cluster-level information by maximizing the mutual information of co-clusters. Different from previous infomax methods which estimate mutual information by neural networks, COIN could easily calculate mutual information. Besides, COIN is an end-to-end coclustering method which can be trained jointly with other objective functions and optimized via back-propagation. Furthermore, we also provide theoretical analysis for COIN. We theoretically prove that COIN is able to effectively increase the mutual information of node embeddings and COIN is upper-bounded by the prior distributions of nodes. We extensively evaluate the proposed COIN framework on various benchmark datasets and tasks to demonstrate the effectiveness of COIN.Comment: NeurIPS 2022 GLFrontiers Worksho

    Inference and Mutual Information on Random Factor Graphs

    Get PDF
    Random factor graphs provide a powerful framework for the study of inference problems such as decoding problems or the stochastic block model. Information-theoretically the key quantity of interest is the mutual information between the observed factor graph and the underlying ground truth around which the factor graph was created; in the stochastic block model, this would be the planted partition. The mutual information gauges whether and how well the ground truth can be inferred from the observable data. For a very general model of random factor graphs we verify a formula for the mutual information predicted by physics techniques. As an application we prove a conjecture about low-density generator matrix codes from [Montanari: IEEE Transactions on Information Theory 2005]. Further applications include phase transitions of the stochastic block model and the mixed k-spin model from physics
    • …
    corecore