103,539 research outputs found
Random walks on dynamic graphs: Mixing times, hitting times, and return probabilities
We establish and generalise several bounds for various random walk quantities including the mixing time and the maximum hitting time. Unlike previous analyses, our derivations are based on rather intuitive notions of local expansion properties which allows us to capture the progress the random walk makes through t-step probabilities.
We apply our framework to dynamically changing graphs, where the set of vertices is fixed while the set of edges changes in each round. For random walks on dynamic connected graphs for which the stationary distribution does not change over time, we show that their behaviour is in a certain sense similar to static graphs.
For example, we show that the mixing and hitting times of any sequence of d-regular connected graphs is O(n^2), generalising a well-known result for static graphs. We also provide refined bounds depending on the isoperimetric dimension of the graph, matching again known results for static graphs. Finally, we investigate properties of random walks on dynamic graphs that are not always connected: we relate their convergence to stationarity to the spectral properties of an average of transition matrices and provide some examples that demonstrate strong discrepancies between static and dynamic graphs
Solo versus collaborative writing: Discrepancies in the use of tables and graphs in academic articles
International audienceThe number of authors collaborating to write scientific articles has been increasing steadily. And, with this collaboration, other factors have also changed, such as the length of the articles and the number of citations. However, little is known about potential discrepancies in the use of tables and graphs between single and collaborating authors. In this paper we ask whether multi-author articles contain more tables and graphs than single-author articles and we studied 5,180 recent articles published in six science and social sciences journals. We found both that pairs and multiple-authors used significantly more tables and graphs than single authors. Such findings indicate that there is a greater emphasis on the role of tables and graphs in collaborative writing, and we discuss some of the possible causes and implications of these findings
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels
Evaluating the performance of graph neural networks (GNNs) is an essential
task for practical GNN model deployment and serving, as deployed GNNs face
significant performance uncertainty when inferring on unseen and unlabeled test
graphs, due to mismatched training-test graph distributions. In this paper, we
study a new problem, GNN model evaluation, that aims to assess the performance
of a specific GNN model trained on labeled and observed graphs, by precisely
estimating its performance (e.g., node classification accuracy) on unseen
graphs without labels. Concretely, we propose a two-stage GNN model evaluation
framework, including (1) DiscGraph set construction and (2) GNNEvaluator
training and inference. The DiscGraph set captures wide-range and diverse graph
data distribution discrepancies through a discrepancy measurement function,
which exploits the outputs of GNNs related to latent node embeddings and node
class predictions. Under the effective training supervision from the DiscGraph
set, GNNEvaluator learns to precisely estimate node classification accuracy of
the to-be-evaluated GNN model and makes an accurate inference for evaluating
GNN model performance. Extensive experiments on real-world unseen and unlabeled
test graphs demonstrate the effectiveness of our proposed method for GNN model
evaluation.Comment: Accepted by NeurIPS 202
- …