10 research outputs found
From random walks to distances on unweighted graphs
Large unweighted directed graphs are commonly used to capture relations
between entities. A fundamental problem in the analysis of such networks is to
properly define the similarity or dissimilarity between any two vertices.
Despite the significance of this problem, statistical characterization of the
proposed metrics has been limited. We introduce and develop a class of
techniques for analyzing random walks on graphs using stochastic calculus.
Using these techniques we generalize results on the degeneracy of hitting times
and analyze a metric based on the Laplace transformed hitting time (LTHT). The
metric serves as a natural, provably well-behaved alternative to the expected
hitting time. We establish a general correspondence between hitting times of
the Brownian motion and analogous hitting times on the graph. We show that the
LTHT is consistent with respect to the underlying metric of a geometric graph,
preserves clustering tendency, and remains robust against random addition of
non-geometric edges. Tests on simulated and real-world data show that the LTHT
matches theoretical predictions and outperforms alternatives.Comment: To appear in NIPS 201
Measuring Global Similarity between Texts
We propose a new similarity measure between texts which, contrary to the
current state-of-the-art approaches, takes a global view of the texts to be
compared. We have implemented a tool to compute our textual distance and
conducted experiments on several corpuses of texts. The experiments show that
our methods can reliably identify different global types of texts.Comment: Submitted to SLSP 201
Streaming Graph Challenge: Stochastic Block Partition
An important objective for analyzing real-world graphs is to achieve scalable
performance on large, streaming graphs. A challenging and relevant example is
the graph partition problem. As a combinatorial problem, graph partition is
NP-hard, but existing relaxation methods provide reasonable approximate
solutions that can be scaled for large graphs. Competitive benchmarks and
challenges have proven to be an effective means to advance state-of-the-art
performance and foster community collaboration. This paper describes a graph
partition challenge with a baseline partition algorithm of sub-quadratic
complexity. The algorithm employs rigorous Bayesian inferential methods based
on a statistical model that captures characteristics of the real-world graphs.
This strong foundation enables the algorithm to address limitations of
well-known graph partition approaches such as modularity maximization. This
paper describes various aspects of the challenge including: (1) the data sets
and streaming graph generator, (2) the baseline partition algorithm with
pseudocode, (3) an argument for the correctness of parallelizing the Bayesian
inference, (4) different parallel computation strategies such as node-based
parallelism and matrix-based parallelism, (5) evaluation metrics for partition
correctness and computational requirements, (6) preliminary timing of a
Python-based demonstration code and the open source C++ code, and (7)
considerations for partitioning the graph in streaming fashion. Data sets and
source code for the algorithm as well as metrics, with detailed documentation
are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing
Conference (HPEC
Le Pouvoir d'Information Supplementaire en Detection des Sousgraphes
In this work, we tackle the problem of hidden community detection. We consider Belief Propagation (BP) applied to the problem of detecting a hidden Erd\H{o}s-R\'enyi (ER) graph embedded in a larger and sparser ER graph, in the presence of side-information. We derive two related algorithms based on BP to perform subgraph detection in the presence of two kinds of side-information. The first variant of side-information consists of a set of nodes, called cues, known to be from the subgraph. The second variant of side-information consists of a set of nodes that are cues with a given probability. It was shown in past works that BP without side-information fails to detect the subgraph correctly when an effective signal-to-noise ratio (SNR) parameter falls below a threshold. In contrast, in the presence of non-trivial side-information, we show that the BP algorithm achieves asymptotically zero error for any value of the SNR parameter. We validate our results through simulations on synthetic datasets as well as on a few real world networks
Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms
This monograph provides an overview of the mathematical theories and
computational algorithm design for contagion source detection in large
networks. By leveraging network centrality as a tool for statistical inference,
we can accurately identify the source of contagions, trace their spread, and
predict future trajectories. This approach provides fundamental insights into
surveillance capability and asymptotic behavior of contagion spreading in
networks. Mathematical theory and computational algorithms are vital to
understanding contagion dynamics, improving surveillance capabilities, and
developing effective strategies to prevent the spread of infectious diseases
and misinformation.Comment: Suggested Citation: Chee Wei Tan and Pei-Duo Yu (2023), "Contagion
Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis
and Network Algorithms", Foundations and Trends in Networking: Vol. 13: No.
2-3, pp 107-251. http://dx.doi.org/10.1561/130000006