285 research outputs found
Finding Rumor Sources on Random Trees
We consider the problem of detecting the source of a rumor which has spread
in a network using only observations about which set of nodes are infected with
the rumor and with no information as to \emph{when} these nodes became
infected. In a recent work \citep{ref:rc} this rumor source detection problem
was introduced and studied. The authors proposed the graph score function {\em
rumor centrality} as an estimator for detecting the source. They establish it
to be the maximum likelihood estimator with respect to the popular Susceptible
Infected (SI) model with exponential spreading times for regular trees. They
showed that as the size of the infected graph increases, for a path graph
(2-regular tree), the probability of source detection goes to while for
-regular trees with the probability of detection, say ,
remains bounded away from and is less than . However, their results
stop short of providing insights for the performance of the rumor centrality
estimator in more general settings such as irregular trees or the SI model with
non-exponential spreading times.
This paper overcomes this limitation and establishes the effectiveness of
rumor centrality for source detection for generic random trees and the SI model
with a generic spreading time distribution. The key result is an interesting
connection between a continuous time branching process and the effectiveness of
rumor centrality. Through this, it is possible to quantify the detection
probability precisely. As a consequence, we recover all previous results as a
special case and obtain a variety of novel results including the {\em
universality} of rumor centrality in the context of tree-like graphs and the SI
model with a generic spreading time distribution.Comment: 38 pages, 6 figure
Rumors in a Network: Who's the Culprit?
We provide a systematic study of the problem of finding the source of a rumor
in a network. We model rumor spreading in a network with a variant of the
popular SIR model and then construct an estimator for the rumor source. This
estimator is based upon a novel topological quantity which we term
\textbf{rumor centrality}. We establish that this is an ML estimator for a
class of graphs. We find the following surprising threshold phenomenon: on
trees which grow faster than a line, the estimator always has non-trivial
detection probability, whereas on trees that grow like a line, the detection
probability will go to 0 as the network grows. Simulations performed on
synthetic networks such as the popular small-world and scale-free networks, and
on real networks such as an internet AS network and the U.S. electric power
grid network, show that the estimator either finds the source exactly or within
a few hops of the true source across different network topologies. We compare
rumor centrality to another common network centrality notion known as distance
centrality. We prove that on trees, the rumor center and distance center are
equivalent, but on general networks, they may differ. Indeed, simulations show
that rumor centrality outperforms distance centrality in finding rumor sources
in networks which are not tree-like.Comment: 43 pages, 13 figure
Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms
This monograph provides an overview of the mathematical theories and
computational algorithm design for contagion source detection in large
networks. By leveraging network centrality as a tool for statistical inference,
we can accurately identify the source of contagions, trace their spread, and
predict future trajectories. This approach provides fundamental insights into
surveillance capability and asymptotic behavior of contagion spreading in
networks. Mathematical theory and computational algorithms are vital to
understanding contagion dynamics, improving surveillance capabilities, and
developing effective strategies to prevent the spread of infectious diseases
and misinformation.Comment: Suggested Citation: Chee Wei Tan and Pei-Duo Yu (2023), "Contagion
Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis
and Network Algorithms", Foundations and Trends in Networking: Vol. 13: No.
2-3, pp 107-251. http://dx.doi.org/10.1561/130000006
Observer Placement for Source Localization: The Effect of Budgets and Transmission Variance
When an epidemic spreads in a network, a key question is where was its
source, i.e., the node that started the epidemic. If we know the time at which
various nodes were infected, we can attempt to use this information in order to
identify the source. However, maintaining observer nodes that can provide their
infection time may be costly, and we may have a budget on the number of
observer nodes we can maintain. Moreover, some nodes are more informative than
others due to their location in the network. Hence, a pertinent question
arises: Which nodes should we select as observers in order to maximize the
probability that we can accurately identify the source? Inspired by the simple
setting in which the node-to-node delays in the transmission of the epidemic
are deterministic, we develop a principled approach for addressing the problem
even when transmission delays are random. We show that the optimal
observer-placement differs depending on the variance of the transmission delays
and propose approaches in both low- and high-variance settings. We validate our
methods by comparing them against state-of-the-art observer-placements and show
that, in both settings, our approach identifies the source with higher
accuracy.Comment: Accepted for presentation at the 54th Annual Allerton Conference on
Communication, Control, and Computin
Estimating Infection Sources in Networks Using Partial Timestamps
We study the problem of identifying infection sources in a network based on
the network topology, and a subset of infection timestamps. In the case of a
single infection source in a tree network, we derive the maximum likelihood
estimator of the source and the unknown diffusion parameters. We then introduce
a new heuristic involving an optimization over a parametrized family of Gromov
matrices to develop a single source estimation algorithm for general graphs.
Compared with the breadth-first search tree heuristic commonly adopted in the
literature, simulations demonstrate that our approach achieves better
estimation accuracy than several other benchmark algorithms, even though these
require more information like the diffusion parameters. We next develop a
multiple sources estimation algorithm for general graphs, which first
partitions the graph into source candidate clusters, and then applies our
single source estimation algorithm to each cluster. We show that if the graph
is a tree, then each source candidate cluster contains at least one source.
Simulations using synthetic and real networks, and experiments using real-world
data suggest that our proposed algorithms are able to estimate the true
infection source(s) to within a small number of hops with a small portion of
the infection timestamps being observed.Comment: 15 pages, 15 figures, accepted by IEEE Transactions on Information
Forensics and Securit
Information extraction with network centralities : finding rumor sources, measuring influence, and learning community structure
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 193-197).Network centrality is a function that takes a network graph as input and assigns a score to each node. In this thesis, we investigate the potential of network centralities for addressing inference questions arising in the context of large-scale networked data. These questions are particularly challenging because they require algorithms which are extremely fast and simple so as to be scalable, while at the same time they must perform well. It is this tension between scalability and performance that this thesis aims to resolve by using appropriate network centralities. Specifically, we solve three important network inference problems using network centrality: finding rumor sources, measuring influence, and learning community structure. We develop a new network centrality called rumor centrality to find rumor sources in networks. We give a linear time algorithm for calculating rumor centrality, demonstrating its practicality for large networks. Rumor centrality is proven to be an exact maximum likelihood rumor source estimator for random regular graphs (under an appropriate probabilistic rumor spreading model). For a wide class of networks and rumor spreading models, we prove that it is an accurate estimator. To establish the universality of rumor centrality as a source estimator, we utilize techniques from the classical theory of generalized Polya's urns and branching processes. Next we use rumor centrality to measure influence in Twitter. We develop an influence score based on rumor centrality which can be calculated in linear time. To justify the use of rumor centrality as the influence score, we use it to develop a new network growth model called topological network growth. We find that this model accurately reproduces two important features observed empirically in Twitter retweet networks: a power-law degree distribution and a superstar node with very high degree. Using these results, we argue that rumor centrality is correctly quantifying the influence of users on Twitter. These scores form the basis of a dynamic influence tracking engine called Trumor which allows one to measure the influence of users in Twitter or more generally in any networked data. Finally we investigate learning the community structure of a network. Using arguments based on social interactions, we determine that the network centrality known as degree centrality can be used to detect communities. We use this to develop the leader-follower algorithm (LFA) which can learn the overlapping community structure in networks. The LFA runtime is linear in the network size. It is also non-parametric, in the sense that it can learn both the number and size of communities naturally from the network structure without requiring any input parameters. We prove that it is very robust and learns accurate community structure for a broad class of networks. We find that the LFA does a better job of learning community structure on real social and biological networks than more common algorithms such as spectral clustering.by Tauhid R. Zaman.Ph.D
Belief Propagation approach to epidemics prediction on networks
In my thesis I study the problem of predicting the evolution of the epidemic spreading on networks when incomplete information, in form of a partial observation, is available. I focus on the irreversible process described by the discrete time version of the Susceptible-Infected-Recovered (SIR) model on networks. Because of its intrinsic stochasticity, forecasting the SIR process is very difficult, even if the structure of individuals contact pattern is known. In today's interconnected and interdependent society, infectious diseases pose the threat of a worldwide epidemic spreading, hence governments and public health systems maintain surveillance programs to report and control the emergence of new disease event ranging from the seasonal influenza to the more severe HIV or Ebola. When new infection cases are discovered in the population it is necessary to provide real-time forecasting of the epidemic evolution. However the incompleteness of accessible data and the intrinsic stochasticity of the contagion pose a major challenge.
The idea behind the work of my thesis is that the correct inference of the contagion process before the detection of the disease permits to use all the available information and, consequently, to obtain reliable predictions. I use the Belief Propagation approach for the prediction of SIR epidemics when a partial observation is available. In this case the reconstruction of the past dynamics can be efficiently performed by this method and exploited to analyze the evolution of the disease. Although the Belief Propagation provides exact results on trees, it turns out that is still a good approximation on general graphs. In this cases Belief Propagation may present convergence related issues, especially on dense networks. Moreover, since this approach is based on a very general principle, it can be adapted to study a wide range of issues, some of which I analyze in the thesis
Probabilistic methods for distributed information dissemination
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 457-484).The ever-increasing growth of modern networks comes with a paradigm shift in network operation. Networks can no longer be abstracted as deterministic, centrally controlled systems with static topologies but need to be understood as highly distributed, dynamic systems with inherent unreliabilities. This makes many communication, coordination and computation tasks challenging and in many scenarios communication becomes a crucial bottleneck. In this thesis, we develop new algorithms and techniques to address these challenges. In particular we concentrate on broadcast and information dissemination tasks and introduce novel ideas on how randomization can lead to powerful, simple and practical communication primitives suitable for these modern networks. In this endeavor we combine and further develop tools from different disciplines trying to simultaneously addresses the distributed, information theoretic and algorithmic aspects of network communication. The two main probabilistic techniques developed to disseminate information in a network are gossip and random linear network coding. Gossip is an alternative to classical flooding approaches: Instead of nodes repeatedly forwarding information to all their neighbors, gossiping nodes forward information only to a small number of (random) neighbors. We show that, when done right, gossip disperses information almost as quickly as flooding, albeit with a drastically reduced communication overhead. Random linear network coding (RLNC) applies when a large amount of information or many messages are to be disseminated. Instead of routing messages through intermediate nodes, that is, following a classical store-and-forward approach, RLNC mixes messages together by forwarding random linear combinations of messages. The simplicity and topology-obliviousness of this approach makes RLNC particularly interesting for the distributed settings considered in this thesis. Unfortunately the performance of RLNC was not well understood even for the simplest such settings. We introduce a simple yet powerful analysis technique that allows us to prove optimal performance guarantees for all settings considered in the literature and many more that were not analyzable so far. Specifically, we give many new results for RLNC gossip algorithms, RLNC algorithms for dynamic networks, and RLNC with correlated data. We also provide a novel highly efficient distributed implementation of RLNC that achieves these performance guarantees while buffering only a minimal amount of information at intermediate nodes. We then apply our techniques to improve communication primitives in multi-hop radio networks. While radio networks inherently support broadcast communications, e.g., from one node to all surrounding nodes, interference of simultaneous transmissions makes multihop broadcast communication an interesting challenge. We show that, again, randomization holds the key for obtaining simple, efficient and distributed information dissemination protocols. In particular, using random back-off strategies to coordinate access to the shared medium leads to optimal gossip-like communications and applying RLNC achieves the first throughput-optimal multi-message communication primitives. Lastly we apply our probabilistic approach for analyzing simple, distributed propagation protocols in a broader context by studying algorithms for the Lovász Local Lemma. These algorithms find solutions to certain local constraint satisfaction problems by randomly fixing and propagating violations locally. Our two main results show that, firstly, there are also efficient deterministic propagation strategies achieving the same and, secondly, using the random fixing strategy has the advantage of producing not just an arbitrary solution but an approximately uniformly random one. Both results lead to simple, constructions for a many locally consistent structures of interest that were not known to be efficiently constructable before.by Bernhard Haeupler.Ph.D
- …