285 research outputs found

    Finding Rumor Sources on Random Trees

    Get PDF
    We consider the problem of detecting the source of a rumor which has spread in a network using only observations about which set of nodes are infected with the rumor and with no information as to \emph{when} these nodes became infected. In a recent work \citep{ref:rc} this rumor source detection problem was introduced and studied. The authors proposed the graph score function {\em rumor centrality} as an estimator for detecting the source. They establish it to be the maximum likelihood estimator with respect to the popular Susceptible Infected (SI) model with exponential spreading times for regular trees. They showed that as the size of the infected graph increases, for a path graph (2-regular tree), the probability of source detection goes to 00 while for dd-regular trees with d≥3d \geq 3 the probability of detection, say αd\alpha_d, remains bounded away from 00 and is less than 1/21/2. However, their results stop short of providing insights for the performance of the rumor centrality estimator in more general settings such as irregular trees or the SI model with non-exponential spreading times. This paper overcomes this limitation and establishes the effectiveness of rumor centrality for source detection for generic random trees and the SI model with a generic spreading time distribution. The key result is an interesting connection between a continuous time branching process and the effectiveness of rumor centrality. Through this, it is possible to quantify the detection probability precisely. As a consequence, we recover all previous results as a special case and obtain a variety of novel results including the {\em universality} of rumor centrality in the context of tree-like graphs and the SI model with a generic spreading time distribution.Comment: 38 pages, 6 figure

    Rumors in a Network: Who's the Culprit?

    Get PDF
    We provide a systematic study of the problem of finding the source of a rumor in a network. We model rumor spreading in a network with a variant of the popular SIR model and then construct an estimator for the rumor source. This estimator is based upon a novel topological quantity which we term \textbf{rumor centrality}. We establish that this is an ML estimator for a class of graphs. We find the following surprising threshold phenomenon: on trees which grow faster than a line, the estimator always has non-trivial detection probability, whereas on trees that grow like a line, the detection probability will go to 0 as the network grows. Simulations performed on synthetic networks such as the popular small-world and scale-free networks, and on real networks such as an internet AS network and the U.S. electric power grid network, show that the estimator either finds the source exactly or within a few hops of the true source across different network topologies. We compare rumor centrality to another common network centrality notion known as distance centrality. We prove that on trees, the rumor center and distance center are equivalent, but on general networks, they may differ. Indeed, simulations show that rumor centrality outperforms distance centrality in finding rumor sources in networks which are not tree-like.Comment: 43 pages, 13 figure

    Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms

    Full text link
    This monograph provides an overview of the mathematical theories and computational algorithm design for contagion source detection in large networks. By leveraging network centrality as a tool for statistical inference, we can accurately identify the source of contagions, trace their spread, and predict future trajectories. This approach provides fundamental insights into surveillance capability and asymptotic behavior of contagion spreading in networks. Mathematical theory and computational algorithms are vital to understanding contagion dynamics, improving surveillance capabilities, and developing effective strategies to prevent the spread of infectious diseases and misinformation.Comment: Suggested Citation: Chee Wei Tan and Pei-Duo Yu (2023), "Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms", Foundations and Trends in Networking: Vol. 13: No. 2-3, pp 107-251. http://dx.doi.org/10.1561/130000006

    Observer Placement for Source Localization: The Effect of Budgets and Transmission Variance

    Get PDF
    When an epidemic spreads in a network, a key question is where was its source, i.e., the node that started the epidemic. If we know the time at which various nodes were infected, we can attempt to use this information in order to identify the source. However, maintaining observer nodes that can provide their infection time may be costly, and we may have a budget kk on the number of observer nodes we can maintain. Moreover, some nodes are more informative than others due to their location in the network. Hence, a pertinent question arises: Which nodes should we select as observers in order to maximize the probability that we can accurately identify the source? Inspired by the simple setting in which the node-to-node delays in the transmission of the epidemic are deterministic, we develop a principled approach for addressing the problem even when transmission delays are random. We show that the optimal observer-placement differs depending on the variance of the transmission delays and propose approaches in both low- and high-variance settings. We validate our methods by comparing them against state-of-the-art observer-placements and show that, in both settings, our approach identifies the source with higher accuracy.Comment: Accepted for presentation at the 54th Annual Allerton Conference on Communication, Control, and Computin

    Estimating Infection Sources in Networks Using Partial Timestamps

    Full text link
    We study the problem of identifying infection sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single infection source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parametrized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true infection source(s) to within a small number of hops with a small portion of the infection timestamps being observed.Comment: 15 pages, 15 figures, accepted by IEEE Transactions on Information Forensics and Securit

    Information extraction with network centralities : finding rumor sources, measuring influence, and learning community structure

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 193-197).Network centrality is a function that takes a network graph as input and assigns a score to each node. In this thesis, we investigate the potential of network centralities for addressing inference questions arising in the context of large-scale networked data. These questions are particularly challenging because they require algorithms which are extremely fast and simple so as to be scalable, while at the same time they must perform well. It is this tension between scalability and performance that this thesis aims to resolve by using appropriate network centralities. Specifically, we solve three important network inference problems using network centrality: finding rumor sources, measuring influence, and learning community structure. We develop a new network centrality called rumor centrality to find rumor sources in networks. We give a linear time algorithm for calculating rumor centrality, demonstrating its practicality for large networks. Rumor centrality is proven to be an exact maximum likelihood rumor source estimator for random regular graphs (under an appropriate probabilistic rumor spreading model). For a wide class of networks and rumor spreading models, we prove that it is an accurate estimator. To establish the universality of rumor centrality as a source estimator, we utilize techniques from the classical theory of generalized Polya's urns and branching processes. Next we use rumor centrality to measure influence in Twitter. We develop an influence score based on rumor centrality which can be calculated in linear time. To justify the use of rumor centrality as the influence score, we use it to develop a new network growth model called topological network growth. We find that this model accurately reproduces two important features observed empirically in Twitter retweet networks: a power-law degree distribution and a superstar node with very high degree. Using these results, we argue that rumor centrality is correctly quantifying the influence of users on Twitter. These scores form the basis of a dynamic influence tracking engine called Trumor which allows one to measure the influence of users in Twitter or more generally in any networked data. Finally we investigate learning the community structure of a network. Using arguments based on social interactions, we determine that the network centrality known as degree centrality can be used to detect communities. We use this to develop the leader-follower algorithm (LFA) which can learn the overlapping community structure in networks. The LFA runtime is linear in the network size. It is also non-parametric, in the sense that it can learn both the number and size of communities naturally from the network structure without requiring any input parameters. We prove that it is very robust and learns accurate community structure for a broad class of networks. We find that the LFA does a better job of learning community structure on real social and biological networks than more common algorithms such as spectral clustering.by Tauhid R. Zaman.Ph.D

    Belief Propagation approach to epidemics prediction on networks

    Get PDF
    In my thesis I study the problem of predicting the evolution of the epidemic spreading on networks when incomplete information, in form of a partial observation, is available. I focus on the irreversible process described by the discrete time version of the Susceptible-Infected-Recovered (SIR) model on networks. Because of its intrinsic stochasticity, forecasting the SIR process is very difficult, even if the structure of individuals contact pattern is known. In today's interconnected and interdependent society, infectious diseases pose the threat of a worldwide epidemic spreading, hence governments and public health systems maintain surveillance programs to report and control the emergence of new disease event ranging from the seasonal influenza to the more severe HIV or Ebola. When new infection cases are discovered in the population it is necessary to provide real-time forecasting of the epidemic evolution. However the incompleteness of accessible data and the intrinsic stochasticity of the contagion pose a major challenge. The idea behind the work of my thesis is that the correct inference of the contagion process before the detection of the disease permits to use all the available information and, consequently, to obtain reliable predictions. I use the Belief Propagation approach for the prediction of SIR epidemics when a partial observation is available. In this case the reconstruction of the past dynamics can be efficiently performed by this method and exploited to analyze the evolution of the disease. Although the Belief Propagation provides exact results on trees, it turns out that is still a good approximation on general graphs. In this cases Belief Propagation may present convergence related issues, especially on dense networks. Moreover, since this approach is based on a very general principle, it can be adapted to study a wide range of issues, some of which I analyze in the thesis

    Probabilistic methods for distributed information dissemination

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 457-484).The ever-increasing growth of modern networks comes with a paradigm shift in network operation. Networks can no longer be abstracted as deterministic, centrally controlled systems with static topologies but need to be understood as highly distributed, dynamic systems with inherent unreliabilities. This makes many communication, coordination and computation tasks challenging and in many scenarios communication becomes a crucial bottleneck. In this thesis, we develop new algorithms and techniques to address these challenges. In particular we concentrate on broadcast and information dissemination tasks and introduce novel ideas on how randomization can lead to powerful, simple and practical communication primitives suitable for these modern networks. In this endeavor we combine and further develop tools from different disciplines trying to simultaneously addresses the distributed, information theoretic and algorithmic aspects of network communication. The two main probabilistic techniques developed to disseminate information in a network are gossip and random linear network coding. Gossip is an alternative to classical flooding approaches: Instead of nodes repeatedly forwarding information to all their neighbors, gossiping nodes forward information only to a small number of (random) neighbors. We show that, when done right, gossip disperses information almost as quickly as flooding, albeit with a drastically reduced communication overhead. Random linear network coding (RLNC) applies when a large amount of information or many messages are to be disseminated. Instead of routing messages through intermediate nodes, that is, following a classical store-and-forward approach, RLNC mixes messages together by forwarding random linear combinations of messages. The simplicity and topology-obliviousness of this approach makes RLNC particularly interesting for the distributed settings considered in this thesis. Unfortunately the performance of RLNC was not well understood even for the simplest such settings. We introduce a simple yet powerful analysis technique that allows us to prove optimal performance guarantees for all settings considered in the literature and many more that were not analyzable so far. Specifically, we give many new results for RLNC gossip algorithms, RLNC algorithms for dynamic networks, and RLNC with correlated data. We also provide a novel highly efficient distributed implementation of RLNC that achieves these performance guarantees while buffering only a minimal amount of information at intermediate nodes. We then apply our techniques to improve communication primitives in multi-hop radio networks. While radio networks inherently support broadcast communications, e.g., from one node to all surrounding nodes, interference of simultaneous transmissions makes multihop broadcast communication an interesting challenge. We show that, again, randomization holds the key for obtaining simple, efficient and distributed information dissemination protocols. In particular, using random back-off strategies to coordinate access to the shared medium leads to optimal gossip-like communications and applying RLNC achieves the first throughput-optimal multi-message communication primitives. Lastly we apply our probabilistic approach for analyzing simple, distributed propagation protocols in a broader context by studying algorithms for the Lovász Local Lemma. These algorithms find solutions to certain local constraint satisfaction problems by randomly fixing and propagating violations locally. Our two main results show that, firstly, there are also efficient deterministic propagation strategies achieving the same and, secondly, using the random fixing strategy has the advantage of producing not just an arbitrary solution but an approximately uniformly random one. Both results lead to simple, constructions for a many locally consistent structures of interest that were not known to be efficiently constructable before.by Bernhard Haeupler.Ph.D
    • …
    corecore