151,529 research outputs found
The mutual information between graphs
The estimation of mutual information between graphs has been an elusive problem until the formulation of graph matching in terms of manifold alignment. Then, graphs are mapped to multi-dimensional sets of points through structure preserving embeddings. Point-wise alignment algorithms can be exploited in this context to re-cast graph matching in terms of point matching. Methods based on bypass entropy estimation must be deployed to render the estimation of mutual information computationally tractable. In this paper the novel contribution is to show how manifold alignment can be combined with copula-based entropy estimators to efficiently estimate the mutual information between graphs. We compare the empirical copula with an Archimedean copula (the independent one) in terms of retrieval/recall after graph comparison. Our experiments show that mutual information built in both choices improves significantly state-of-the art divergences.Funding. F. Escolano, M.A. Lozano: Project TIN2012-32839 (Spanish Gov.). M. Curado: BES-2013-064482 (Spanish Gov.). E. R. Hancock: Royal Society Wolfson Research Merit Award
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Glauber Dynamics on Trees and Hyperbolic Graphs
We study continuous time Glauber dynamics for random configurations with
local constraints (e.g. proper coloring, Ising and Potts models) on finite
graphs with vertices and of bounded degree. We show that the relaxation
time
(defined as the reciprocal of the spectral gap ) for
the dynamics on trees and on planar hyperbolic graphs, is polynomial in .
For these hyperbolic graphs, this yields a general polynomial sampling
algorithm for random configurations. We then show that if the relaxation time
satisfies , then the correlation coefficient, and the
mutual information, between any local function (which depends only on the
configuration in a fixed window) and the boundary conditions, decays
exponentially in the distance between the window and the boundary. For the
Ising model on a regular tree, this condition is sharp.Comment: To appear in Probability Theory and Related Field
Seeded Graph Matching: Efficient Algorithms and Theoretical Guarantees
In this paper, a new information theoretic framework for graph matching is
introduced. Using this framework, the graph isomorphism and seeded graph
matching problems are studied. The maximum degree algorithm for graph
isomorphism is analyzed and sufficient conditions for successful matching are
rederived using type analysis. Furthermore, a new seeded matching algorithm
with polynomial time complexity is introduced. The algorithm uses `typicality
matching' and techniques from point-to-point communications for reliable
matching. Assuming an Erdos-Renyi model on the correlated graph pair, it is
shown that successful matching is guaranteed when the number of seeds grows
logarithmically with the number of vertices in the graphs. The logarithmic
coefficient is shown to be inversely proportional to the mutual information
between the edge variables in the two graphs
COIN: Co-Cluster Infomax for Bipartite Graphs
Bipartite graphs are powerful data structures to model interactions between
two types of nodes, which have been used in a variety of applications, such as
recommender systems, information retrieval, and drug discovery. A fundamental
challenge for bipartite graphs is how to learn informative node embeddings.
Despite the success of recent self-supervised learning methods on bipartite
graphs, their objectives are discriminating instance-wise positive and negative
node pairs, which could contain cluster-level errors. In this paper, we
introduce a novel co-cluster infomax (COIN) framework, which captures the
cluster-level information by maximizing the mutual information of co-clusters.
Different from previous infomax methods which estimate mutual information by
neural networks, COIN could easily calculate mutual information. Besides, COIN
is an end-to-end coclustering method which can be trained jointly with other
objective functions and optimized via back-propagation. Furthermore, we also
provide theoretical analysis for COIN. We theoretically prove that COIN is able
to effectively increase the mutual information of node embeddings and COIN is
upper-bounded by the prior distributions of nodes. We extensively evaluate the
proposed COIN framework on various benchmark datasets and tasks to demonstrate
the effectiveness of COIN.Comment: NeurIPS 2022 GLFrontiers Worksho
Inference and Mutual Information on Random Factor Graphs
Random factor graphs provide a powerful framework for the study of inference problems such as decoding problems or the stochastic block model. Information-theoretically the key quantity of interest is the mutual information between the observed factor graph and the underlying ground truth around which the factor graph was created; in the stochastic block model, this would be the planted partition. The mutual information gauges whether and how well the ground truth can be inferred from the observable data. For a very general model of random factor graphs we verify a formula for the mutual information predicted by physics techniques. As an application we prove a conjecture about low-density generator matrix codes from [Montanari: IEEE Transactions on Information Theory 2005]. Further applications include phase transitions of the stochastic block model and the mixed k-spin model from physics
- …