33 research outputs found
Gossip vs. Markov Chains, and Randomness-Efficient Rumor Spreading
We study gossip algorithms for the rumor spreading problem which asks one
node to deliver a rumor to all nodes in an unknown network. We present the
first protocol for any expander graph with nodes such that, the
protocol informs every node in rounds with high probability, and
uses random bits in total. The runtime of our protocol is
tight, and the randomness requirement of random bits almost
matches the lower bound of random bits for dense graphs. We
further show that, for many graph families, polylogarithmic number of random
bits in total suffice to spread the rumor in rounds.
These results together give us an almost complete understanding of the
randomness requirement of this fundamental gossip process.
Our analysis relies on unexpectedly tight connections among gossip processes,
Markov chains, and branching programs. First, we establish a connection between
rumor spreading processes and Markov chains, which is used to approximate the
rumor spreading time by the mixing time of Markov chains. Second, we show a
reduction from rumor spreading processes to branching programs, and this
reduction provides a general framework to derandomize gossip processes. In
addition to designing rumor spreading protocols, these novel techniques may
have applications in studying parallel and multiple random walks, and
randomness complexity of distributed algorithms.Comment: 41 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1304.135
Gossip vs. Markov Chains, and Randomness-Efficient Rumor Spreading
We study gossip algorithms for the rumor spreading problem which asks one node to deliver a rumor to all nodes in an unknown network, and every node is only allowed to call one neighbor in each round. In this work we introduce two fundamentally new techniques in studying the rumor spreading problem:
First, we establish a new connection between the rumor spreading process in an arbitrary graph and certain Markov chains. While most previous work analyzed the rumor spreading time in general graphs by studying the rate of the number of (un-)informed nodes after every round, we show that the mixing time of a certain Markov chain suffices to bound the rumor spreading time in an arbitrary graph.
Second, we construct a reduction from rumor spreading processes to branching programs. This reduction gives us a general framework to derandomize the rumor spreading and other gossip processes. In particular, we show that, for any n-vertex expander graph, there is a protocol which informs every node in O(log n) rounds with high probability, and uses O (log n · log log n) random bits in total. The runtime of our protocol is tight, and the randomness requirement of O (log n· log log n) random bits almost matches the lower bound of Ω(log n) random bits. We further show that, for many graph families (defined with respect to the expansion and the degree), O (poly log n) random bits in total suffice for fast rumor spreading. These results give us an almost complete understanding of the role of randomness in the rumor spreading process, which was extensively studied over the past years
On the Inherent Anonymity of Gossiping
Detecting the source of a gossip is a critical issue, related to identifying
patient zero in an epidemic, or the origin of a rumor in a social network.
Although it is widely acknowledged that random and local gossip communications
make source identification difficult, there exists no general quantification of
the level of anonymity provided to the source. This paper presents a principled
method based on -differential privacy to analyze the inherent
source anonymity of gossiping for a large class of graphs. First, we quantify
the fundamental limit of source anonymity any gossip protocol can guarantee in
an arbitrary communication graph. In particular, our result indicates that when
the graph has poor connectivity, no gossip protocol can guarantee any
meaningful level of differential privacy. This prompted us to further analyze
graphs with controlled connectivity. We prove on these graphs that a large
class of gossip protocols, namely cobra walks, offers tangible differential
privacy guarantees to the source. In doing so, we introduce an original proof
technique based on the reduction of a gossip protocol to what we call a random
walk with probabilistic die out. This proof technique is of independent
interest to the gossip community and readily extends to other protocols
inherited from the security community, such as the Dandelion protocol.
Interestingly, our tight analysis precisely captures the trade-off between
dissemination time of a gossip protocol and its source anonymity.Comment: Full version of DISC2023 pape
Consensus vs Broadcast, with and without Noise
International audienceConsensus and Broadcast are two fundamental problems in distributed computing, whose solutions have several applications. Intuitively, Consensus should be no harder than Broadcast , and this can be rigorously established in several models. Can Consensus be easier than Broadcast? In models that allow noiseless communication, we prove a reduction of (a suitable variant of) Broadcast to binary Consensus, that preserves the communication model and all complexity parameters such as randomness, number of rounds, communication per round, etc., while there is a loss in the success probability of the protocol. Using this reduction, we get, among other applications, the first logarithmic lower bound on the number of rounds needed to achieve Consensus in the uniform GOSSIP model on the complete graph. The lower bound is tight and, in this model, Consensus and Broadcast are equivalent. We then turn to distributed models with noisy communication channels that have been studied in the context of some bio-inspired systems. In such models, only one noisy bit is exchanged when a communication channel is established between two nodes, and so one cannot easily simulate a noiseless protocol by using error-correcting codes. An âŠ(Δ â2 n) lower bound on the number of rounds needed for Broadcast is proved by Boczkowski et al. [PLOS Comp. Bio. 2018] in one such model (noisy uniform PULL, where Δ is a parameter that measures the amount of noise). We prove an O(Δ â2 log n) upper bound for binary Consensus in such model, thus establishing an exponential gap between the number of rounds necessary for Consensus versus Broadcast. We also prove a new O(Δ â2 n log n) upper bound for Broadcast in this model
Recommended from our members
The Impact of Randomisation in Load Balancing and Random Walks
The real world is full of uncertainties. Classical analyses usually favour deterministic cases, which in practice can be too restricted. Hence it motivates us to add in randomness to make models similar to practical situations. In this thesis, we mainly study two network problems taken from the distributed computing world: iterative load balancing and random walks. An interesting observation is that the problems we study, though not quite related regarding their real world applications, can be linked by the same mathematical toolkit: Markov chain theory. These problems have been heavily studied in the literature. However, their assumptions are mostly \emph{deterministic}, which causes less flexibility and generality to the real world settings. The novelty of this thesis is that we add randomness in these problems in order to observe worst cases vs. average cases (load balancing) and static cases vs. dynamic cases (random walks).
For iterative load balancing, the randomness is added on the number of tasks over the entire network. Previous works often assumed worst case initial loads, which may be wasteful sometimes. Hence we relax this condition and assume the loads are drawn from different probability distributions.
In particular, we no longer assume the initial loads are chosen by an adversary. Instead, we assume the initial loads on each processor are sampled from independent and identically distributed (i.i.d.) probability distributions. We then study the same problems as in classical settings, i.e., the time needed for the load balancing process to reach a sufficiently small discrepancy.
Our main result implies that under such a regime, the time required to balance a network can be much faster. An insightful observation is that the load discrepancy is proportional to the term where is the time used to run the protocol. This implies two main improvements compared with previous works: first, when the initial discrepancy is the same, our regime can reach small discrepancy faster; second, we have established a connection between the time and the discrepancy while previous analyses do not have.
For random walks, the randomness is added on the network topologies. This means at each time step (considering discrete times), the underlying network can change randomly. In particular, we want the graph ``evolves'' instead of changing arbitrarily. To model the graph changing process, we adopt a model commonly used in the literature, i.e., the edge-Markovian model. If an edge does not exist between the two nodes, then it will appear in the next step with probability , and if it does then in the next step it will disappear with probability . This model can simulate real world scenarios such as adding friends with each other in social networks or a disruption between two remotely connected computers.
Our main contributions regarding random walks include the following results. First, we divided the edge-Markovian graph model into different regimes in a parameterised way. This provides an intuitive path to similar analyses of dynamic graph models. Dynamic models are often hard to analyse in the field because of its complicated nature. We present a possible strategy to reach some feasible solutions by using parameters ( above) to control the process. Second, we again analyse the random walk behaviours on such models. We have found that under certain regimes, the random walk still shows similar behaviours especially its mixing nature as in static settings. For the other regimes, we also show either weaker mixing or no mixing results
Epidemic-Style Information Dissemination in Large-Scale Wireless Networks
Steen, M.R. van [Promotor
SoS: self-organizing substrates
Large-scale networked systems often, both by design or chance exhibit self-organizing properties. Understanding self-organization using tools from cybernetics, particularly modeling them as Markov processes is a first step towards a formal framework which can be used in (decentralized) systems research and design.Interesting aspects to look for include the time evolution of a system and to investigate if and when a system converges to some absorbing states or stabilizes into a dynamic (and stable) equilibrium and how it performs under such an equilibrium state. Such a formal framework brings in objectivity in systems research, helping discern facts from artefacts as well as providing tools for quantitative evaluation of such systems. This thesis introduces such formalism in analyzing and evaluating peer-to-peer (P2P) systems in order to better understand the dynamics of such systems which in turn helps in better designs. In particular this thesis develops and studies the fundamental building blocks for a P2P storage system. In the process the design and evaluation methodology we pursue illustrate the typical methodological approaches in studying and designing self-organizing systems, and how the analysis methodology influences the design of the algorithms themselves to meet system design goals (preferably with quantifiable guarantees). These goals include efficiency, availability and durability, load-balance, high fault-tolerance and self-maintenance even in adversarial conditions like arbitrarily skewed and dynamic load and high membership dynamics (churn), apart of-course the specific functionalities that the system is supposed to provide. The functionalities we study here are some of the fundamental building blocks for various P2P applications and systems including P2P storage systems, and hence we call them substrates or base infrastructure. These elemental functionalities include: (i) Reliable and efficient discovery of resources distributed over the network in a decentralized manner; (ii) Communication among participants in an address independent manner, i.e., even when peers change their physical addresses; (iii) Availability and persistence of stored objects in the network, irrespective of availability or departure of individual participants from the system at any time; and (iv) Freshness of the objects/resources' (up-to-date replicas). Internet-scale distributed index structures (often termed as structured overlays) are used for discovery and access of resources in a decentralized setting. We propose a rapid construction from scratch and maintenance of the P-Grid overlay network in a self-organized manner so as to provide efficient search of both individual keys as well as a whole range of keys, doing so providing good load-balancing characteristics for diverse kind of arbitrarily skewed loads - storage and replication, query forwarding and query answering loads. For fast overlay construction we employ recursive partitioning of the key-space so that the resulting partitions are balanced with respect to storage load and replication. The proper algorithmic parameters for such partitioning is derived from a transient analysis of the partitioning process which has Markov property. Preservation of ordering information in P-Grid such that queries other than exact queries, like range queries can be efficiently and rather trivially handled makes P-Grid suitable for data-oriented applications. Fast overlay construction is analogous to building an index on a new set of keys making P-Grid suitable as the underlying indexing mechanism for peer-to-peer information retrieval applications among other potential applications which may require frequent indexing of new attributes apart regular updates to an existing index. In order to deal with membership dynamics, in particular changing physical address of peers across sessions, the overlay itself is used as a (self-referential) directory service for maintaining the participating peers' physical addresses across sessions. Exploiting this self-referential directory, a family of overlay maintenance scheme has been designed with lower communication overhead than other overlay maintenance strategies. The notion of dynamic equilibrium study for overlays under continuous churn and repairs, modeled as a Markov process, was introduced in order to evaluate and compare the overlay maintenance schemes. While the self-referential directory was originally invented to realize overlay maintenance schemes with lower overheads than existing overlay maintenance schemes, the self-referential directory is generic in nature and can be used for various other purposes, e.g., as a decentralized public key infrastructure. Persistence of peer identity across sessions, in spite of changes in physical address, provides a logical independence of the overlay network from the underlying physical network. This has many other potential usages, for example, efficient maintenance mechanisms for P2P storage systems and P2P trust and reputation management. We specifically look into the dynamics of maintaining redundancy for storage systems and design a novel lazy maintenance strategy. This strategy is algorithmically a simple variant of existing maintenance strategies which adapts to the system dynamics. This randomized lazy maintenance strategy thus explores the cost-performance trade-offs of the storage maintenance operations in a self-organizing manner. We model the storage system (redundancy), under churn and maintenance, as a Markov process. We perform an equilibrium study to show that the system operates in a more stable dynamic equilibrium with our strategy than for the existing maintenance scheme for comparable overheads. Particularly, we show that our maintenance scheme provides substantial performance gains in terms of maintenance overhead and system's resilience in presence of churn and correlated failures. Finally, we propose a gossip mechanism which works with lower communication overhead than existing approaches for communication among a relatively large set of unreliable peers without assuming any specific structure for their mutual connectivity. We use such a communication primitive for propagating replica updates in P2P systems, facilitating management of mutable content in P2P systems. The peer population affected by a gossip can be modeled as a Markov process. Studying the transient spread of gossips help in choosing proper algorithm parameters to reduce communication overhead while guaranteeing coverage of online peers. Each of these substrates in themselves were developed to find practical solutions for real problems. Put together, these can be used in other applications, including a P2P storage system with support for efficient lookup and inserts, membership dynamics, content mutation and updates, persistence and availability. Many of the ideas have already been implemented in real systems and several others are in the way to be integrated into the implementations. There are two principal contributions of this dissertation. It provides design of the P2P systems which are useful for end-users as well as other application developers who can build upon these existing systems. Secondly, it adapts and introduces the methodology of analysis of a system's time-evolution (tools typically used in diverse domains including physics and cybernetics) to study the long run behavior of P2P systems, and uses this methodology to (re-)design appropriate algorithms and evaluate them. We observed that studying P2P systems from the perspective of complex systems reveals their inner dynamics and hence ways to exploit such dynamics for suitable or better algorithms. In other words, the analysis methodology in itself strongly influences and inspires the way we design such systems. We believe that such an approach of orchestrating self-organization in internet-scale systems, where the algorithms and the analysis methodology have strong mutual influence will significantly change the way future such systems are developed and evaluated. We envision that such an approach will particularly serve as an important tool for the nascent but fast moving P2P systems research and development community
Recommended from our members
Information dissemination via random walks
Information dissemination is a fundamental task in distributed computing:
How to deliver a piece of information from a node of a network to some or all other nodes?
In the face of large and still growing modern networks, it is imperative that dissemination algorithms are decentralised and can operate under unreliable conditions.
In the past decades, randomised rumour spreading algorithms
have addressed these challenges.
In these algorithms, a message is initially placed at a source node of a network, and, at regular intervals, each node contacts a randomly selected neighbour.
A message may be transmitted in one or both directions during each of these communications, depending on the exact protocol.
The main measure of performance for these algorithms is their broadcast time, which is the time until a message originating from a source node is disseminated to all nodes of the network.
Apart from being extremely simple and robust to failures, randomised rumour spreading achieves theoretically optimal broadcast time in many common network topologies.
In this thesis, we propose an agent-based information dissemination algorithm, called Visit-Exchange.
In our protocol, a number of agents perform independent random walks in the network.
An agent becomes informed when it visits a node that has a message, and later informs all future nodes it visits.
Visit-Exchange shares many of the properties of randomised rumour spreading, namely, it is very simple and uses the same amount of communication in a unit of time.
Moreover, the protocol can be used as a simple model of non-recoverable epidemic processes.
We investigate the broadcast time of Visit-Exchange on a variety of network topologies, and compare it to traditional rumour spreading.
On dense regular networks we show that the two types of protocols are equivalent, which means that in this setting the vast literature on randomised rumour spreading applies in our model as well.
Since many networks of interest, including real-world ones, are very sparse, we also study agent-based broadcast for sparse networks.
Our results include almost optimal or optimal bounds for sparse regular graphs, expanders, random regular graphs, balanced trees and grids.
We establish that depending on the network topology, Visit-Exchange may be either slower or faster than traditional rumour spreading.
In particular, in graphs consisting of hubs that are not well connected, broadcast using agents can be significantly faster.
Our conclusion is that a combined broadcasting protocol that simultaneously uses both traditional rumour spreading and agent-based dissemination can be fast on a larger range of topologies than each of its components separately.Gates Cambridge Trust, St John's College Benefactors' Scholarshi
Information extraction with network centralities : finding rumor sources, measuring influence, and learning community structure
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 193-197).Network centrality is a function that takes a network graph as input and assigns a score to each node. In this thesis, we investigate the potential of network centralities for addressing inference questions arising in the context of large-scale networked data. These questions are particularly challenging because they require algorithms which are extremely fast and simple so as to be scalable, while at the same time they must perform well. It is this tension between scalability and performance that this thesis aims to resolve by using appropriate network centralities. Specifically, we solve three important network inference problems using network centrality: finding rumor sources, measuring influence, and learning community structure. We develop a new network centrality called rumor centrality to find rumor sources in networks. We give a linear time algorithm for calculating rumor centrality, demonstrating its practicality for large networks. Rumor centrality is proven to be an exact maximum likelihood rumor source estimator for random regular graphs (under an appropriate probabilistic rumor spreading model). For a wide class of networks and rumor spreading models, we prove that it is an accurate estimator. To establish the universality of rumor centrality as a source estimator, we utilize techniques from the classical theory of generalized Polya's urns and branching processes. Next we use rumor centrality to measure influence in Twitter. We develop an influence score based on rumor centrality which can be calculated in linear time. To justify the use of rumor centrality as the influence score, we use it to develop a new network growth model called topological network growth. We find that this model accurately reproduces two important features observed empirically in Twitter retweet networks: a power-law degree distribution and a superstar node with very high degree. Using these results, we argue that rumor centrality is correctly quantifying the influence of users on Twitter. These scores form the basis of a dynamic influence tracking engine called Trumor which allows one to measure the influence of users in Twitter or more generally in any networked data. Finally we investigate learning the community structure of a network. Using arguments based on social interactions, we determine that the network centrality known as degree centrality can be used to detect communities. We use this to develop the leader-follower algorithm (LFA) which can learn the overlapping community structure in networks. The LFA runtime is linear in the network size. It is also non-parametric, in the sense that it can learn both the number and size of communities naturally from the network structure without requiring any input parameters. We prove that it is very robust and learns accurate community structure for a broad class of networks. We find that the LFA does a better job of learning community structure on real social and biological networks than more common algorithms such as spectral clustering.by Tauhid R. Zaman.Ph.D