145 research outputs found
Fast matrix computations for pair-wise and column-wise commute times and Katz scores
We first explore methods for approximating the commute time and Katz score
between a pair of nodes. These methods are based on the approach of matrices,
moments, and quadrature developed in the numerical linear algebra community.
They rely on the Lanczos process and provide upper and lower bounds on an
estimate of the pair-wise scores. We also explore methods to approximate the
commute times and Katz scores from a node to all other nodes in the graph.
Here, our approach for the commute times is based on a variation of the
conjugate gradient algorithm, and it provides an estimate of all the diagonals
of the inverse of a matrix. Our technique for the Katz scores is based on
exploiting an empirical localization property of the Katz matrix. We adopt
algorithms used for personalized PageRank computing to these Katz scores and
theoretically show that this approach is convergent. We evaluate these methods
on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our
results show that our pair-wise commute time method and column-wise Katz
algorithm both have attractive theoretical properties and empirical
performance.Comment: 35 pages, journal version of
http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for
publication. Please see
http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for
supplemental code
On the limiting behavior of parameter-dependent network centrality measures
We consider a broad class of walk-based, parameterized node centrality
measures for network analysis. These measures are expressed in terms of
functions of the adjacency matrix and generalize various well-known centrality
indices, including Katz and subgraph centrality. We show that the parameter can
be "tuned" to interpolate between degree and eigenvector centrality, which
appear as limiting cases. Our analysis helps explain certain correlations often
observed between the rankings obtained using different centrality measures, and
provides some guidance for the tuning of parameters. We also highlight the
roles played by the spectral gap of the adjacency matrix and by the number of
triangles in the network. Our analysis covers both undirected and directed
networks, including weighted ones. A brief discussion of PageRank is also
given.Comment: First 22 pages are the paper, pages 22-38 are the supplementary
material
Recommended from our members
A matrix iteration for dynamic network summaries
We propose a new algorithm for summarizing properties of large-scale time-evolving networks. This type of data, recording connections that come and go over time, is being generated in many modern applications, including telecommunications and on-line human social behavior. The
algorithm computes a dynamic measure of how well pairs of nodes can communicate by taking account of routes through the network that respect the arrow of time. We take the conventional approach of downweighting for length (messages become corrupted as they are passed along) and add the novel feature of downweighting for age (messages go out of date). This allows us to generalize widely used
Katz-style centrality measures that have proved popular in network science to the case of dynamic networks sampled at non-uniform points in time. We illustrate the new approach on synthetic and real data
Ranking hubs and authorities using matrix functions
The notions of subgraph centrality and communicability, based on the
exponential of the adjacency matrix of the underlying graph, have been
effectively used in the analysis of undirected networks. In this paper we
propose an extension of these measures to directed networks, and we apply them
to the problem of ranking hubs and authorities. The extension is achieved by
bipartization, i.e., the directed network is mapped onto a bipartite undirected
network with twice as many nodes in order to obtain a network with a symmetric
adjacency matrix. We explicitly determine the exponential of this adjacency
matrix in terms of the adjacency matrix of the original, directed network, and
we give an interpretation of centrality and communicability in this new
context, leading to a technique for ranking hubs and authorities. The matrix
exponential method for computing hubs and authorities is compared to the well
known HITS algorithm, both on small artificial examples and on more realistic
real-world networks. A few other ranking algorithms are also discussed and
compared with our technique. The use of Gaussian quadrature rules for
calculating hub and authority scores is discussed.Comment: 28 pages, 6 figure
The Physics of Communicability in Complex Networks
A fundamental problem in the study of complex networks is to provide
quantitative measures of correlation and information flow between different
parts of a system. To this end, several notions of communicability have been
introduced and applied to a wide variety of real-world networks in recent
years. Several such communicability functions are reviewed in this paper. It is
emphasized that communication and correlation in networks can take place
through many more routes than the shortest paths, a fact that may not have been
sufficiently appreciated in previously proposed correlation measures. In
contrast to these, the communicability measures reviewed in this paper are
defined by taking into account all possible routes between two nodes, assigning
smaller weights to longer ones. This point of view naturally leads to the
definition of communicability in terms of matrix functions, such as the
exponential, resolvent, and hyperbolic functions, in which the matrix argument
is either the adjacency matrix or the graph Laplacian associated with the
network. Considerable insight on communicability can be gained by modeling a
network as a system of oscillators and deriving physical interpretations, both
classical and quantum-mechanical, of various communicability functions.
Applications of communicability measures to the analysis of complex systems are
illustrated on a variety of biological, physical and social networks. The last
part of the paper is devoted to a review of the notion of locality in complex
networks and to computational aspects that by exploiting sparsity can greatly
reduce the computational efforts for the calculation of communicability
functions for large networks.Comment: Review Article. 90 pages, 14 figures. Contents: Introduction;
Communicability in Networks; Physical Analogies; Comparing Communicability
Functions; Communicability and the Analysis of Networks; Communicability and
Localization in Complex Networks; Computability of Communicability Functions;
Conclusions and Prespective
Graph Deep Learning: Methods and Applications
The past few years have seen the growing prevalence of deep neural networks on various application domains including image processing, computer vision, speech recognition, machine translation, self-driving cars, game playing, social networks, bioinformatics, and healthcare etc. Due to the broad applications and strong performance, deep learning, a subfield of machine learning and artificial intelligence, is changing everyone\u27s life.Graph learning has been another hot field among the machine learning and data mining communities, which learns knowledge from graph-structured data. Examples of graph learning range from social network analysis such as community detection and link prediction, to relational machine learning such as knowledge graph completion and recommender systems, to mutli-graph tasks such as graph classification and graph generation etc.An emerging new field, graph deep learning, aims at applying deep learning to graphs. To deal with graph-structured data, graph neural networks (GNNs) are invented in recent years which directly take graphs as input and output graph/node representations. Although GNNs have shown superior performance than traditional methods in tasks such as semi-supervised node classification, there still exist a wide range of other important graph learning problems where either GNNs\u27 applicabilities have not been explored or GNNs only have less satisfying performance.In this dissertation, we dive deeper into the field of graph deep learning. By developing new algorithms, architectures and theories, we push graph neural networks\u27 boundaries to a much wider range of graph learning problems. The problems we have explored include: 1) graph classification; 2) medical ontology embedding; 3) link prediction; 4) recommender systems; 5) graph generation; and 6) graph structure optimization.We first focus on two graph representation learning problems: graph classification and medical ontology embedding.For graph classification, we develop a novel deep GNN architecture which aggregates node features through a novel SortPooling layer that replaces the simple summing used in previous works. We demonstrate its state-of-the-art graph classification performance on benchmark datasets. For medical ontology embedding, we propose a novel hierarchical attention propagation model, which uses attention mechanism to learn embeddings of medical concepts from hierarchically-structured medical ontologies such as ICD-9 and CCS. We validate the learned embeddings on sequential procedure/diagnosis prediction tasks with real patient data.Then we investigate GNNs\u27 potential for predicting relations, specifically link prediction and recommender systems. For link prediction, we first develop a theory unifying various traditional link prediction heuristics, and then design a framework to automatically learn suitable heuristics from a given network based on GNNs. Our model shows unprecedented strong link prediction performance, significantly outperforming all traditional methods. For recommender systems, we propose a novel graph-based matrix completion model, which uses a GNN to learn graph structure features from the bipartite graph formed by user and item interactions. Our model not only outperforms various matrix completion baselines, but also demonstrates excellent transfer learning ability -- a model trained on MovieLens can be directly used to predict Douban movie ratings with high performance.Finally, we explore GNNs\u27 applicability to graph generation and graph structure optimization. We focus on a specific type of graphs which usually carry computations on them, namely directed acyclic graphs (DAGs). We develop a variational autoencoder (VAE) for DAGs and prove that it can injectively map computations into a latent space. This injectivity allows us to perform optimization in the continuous latent space instead of the original discrete structure space. We then apply our VAE to two types of DAGs, neural network architectures and Bayesian networks. Experiments show that our model not only generates novel and valid DAGs, but also finds high-quality neural architectures and Bayesian networks through performing Bayesian optimization in its latent space
Graph diffusions and matrix functions: fast algorithms and localization results
Network analysis provides tools for addressing fundamental applications in graphs such as webpage ranking, protein-function prediction, and product categorization and recommendation. As real-world networks grow to have millions of nodes and billions of edges, the scalability of network analysis algorithms becomes increasingly important. Whereas many standard graph algorithms rely on matrix-vector operations that require exploring the entire graph, this thesis is concerned with graph algorithms that are local (that explore only the graph region near the nodes of interest) as well as the localized behavior of global algorithms. We prove that two well-studied matrix functions for graph analysis, PageRank and the matrix exponential, stay localized on networks that have a skewed degree sequence related to the power-law degree distribution common to many real-world networks. Our results give the first theoretical explanation of a localization phenomenon that has long been observed in real-world networks. We prove our novel method for the matrix exponential converges in sublinear work on graphs with the specified degree sequence, and we adapt our method to produce the first deterministic algorithm for computing the related heat kernel diffusion in constant-time. Finally, we generalize this framework to compute any graph diffusion in constant time
- …