123 research outputs found
Being Fast Means Being Chatty: The Local Information Cost of Graph Spanners
We introduce a new measure for quantifying the amount of information that the
nodes in a network need to learn to jointly solve a graph problem. We show that
the local information cost () presents a natural lower bound on
the communication complexity of distributed algorithms. For the synchronous
CONGEST-KT1 model, where each node has initial knowledge of its neighbors' IDs,
we prove that bits are
required for solving a graph problem with a -round algorithm that
errs with probability at most . Our result is the first lower bound
that yields a general trade-off between communication and time for graph
problems in the CONGEST-KT1 model.
We demonstrate how to apply the local information cost by deriving a lower
bound on the communication complexity of computing a -spanner that
consists of at most edges, where . Our main result is that any -time
algorithm must send at least bits in the
CONGEST model under the KT1 assumption. Previously, only a trivial lower bound
of bits was known for this problem.
A consequence of our lower bound is that achieving both time- and
communication-optimality is impossible when designing a distributed spanner
algorithm. In light of the work of King, Kutten, and Thorup (PODC 2015), this
shows that computing a minimum spanning tree can be done significantly faster
than finding a spanner when considering algorithms with
communication complexity. Our result also implies time complexity lower bounds
for constructing a spanner in the node-congested clique of Augustine et al.
(2019) and in the push-pull gossip model with limited bandwidth
Efficient concurrent data structure access parallelism techniques for increasing scalability
Multi-core processors have revolutionised the way data structures are designed by bringing parallelism to mainstream computing. Key to exploiting hardware parallelism available in multi-core processors are concurrent data structures. However, some concurrent data structure abstractions are inherently sequential and incapable of harnessing the parallelism performance of multi-core processors. Designing and implementing concurrent data structures to harness hardware parallelism is challenging due to the requirement of correctness, efficiency and practicability under various application constraints. In this thesis, our research contribution is towards improving concurrent data structure access parallelism to increase data structure performance. We propose new design frameworks that improve access parallelism of already existing concurrent data structure designs. Also, we propose new concurrent data structure designs with significant performance improvements. To give an insight into the interplay between hardware and concurrent data structure access parallelism, we give a detailed analysis and model the performance scalability with varying parallelism.In the first part of the thesis, we focus on data structure semantic relaxation. By relaxing the semantics of a data structure, a bigger design space, that allows weaker synchronization and more useful parallelism, is unveiled. Investigating new data structure designs, capable of trading semantics for achieving better performance in a monotonic way, is a major challenge in the area. We algorithmically address this challenge in this part of the thesis. We present an efficient, lock-free, concurrent data structure design framework for out-of-order semantic relaxation. We introduce a new two-dimensional algorithmic design, that uses multiple instances of a given data structure to improve access parallelism. In the second part of the thesis, we propose an efficient priority queue that improves access parallelism by reducing the number of synchronization points for each operation. Priority queues are fundamental abstract data types, often used to manage limited resources in parallel systems. Typical proposed parallel priority queue implementations are based on heaps or skip lists. In recent literature, skip lists have been shown to be the most efficient design choice for implementing priority queues. Though numerous intricate implementations of skip list based queues have been proposed in the literature, their performance is constrained by the high number of global atomic updates per operation and the high memory consumption, which are proportional to the number of sub-lists in the queue. In this part of the thesis, we propose an alternative approach for designing lock-free linearizable priority queues, that significantly improve memory efficiency and throughput performance, by reducing the number of global atomic updates and memory consumption as compared to skip-list based queues. To achieve this, our new design combines two structures; a search tree and a linked list, forming what we call a Tree Search List Queue (TSLQueue). Subsequently, we analyse and introduce a model for lock-free concurrent data structure access parallelism. The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access points, leading to thread serialisation, and hindering parallelism. Aiming to address this challenge, a significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this part of the thesis, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add
Broadcast CONGEST Algorithms against Adversarial Edges
We consider the corner-stone broadcast task with an adaptive adversary that
controls a fixed number of edges in the input communication graph. In this
model, the adversary sees the entire communication in the network and the
random coins of the nodes, while maliciously manipulating the messages sent
through a set of edges (unknown to the nodes). Since the influential work
of [Pease, Shostak and Lamport, JACM'80], broadcast algorithms against
plentiful adversarial models have been studied in both theory and practice for
over more than four decades. Despite this extensive research, there is no round
efficient broadcast algorithm for general graphs in the CONGEST model of
distributed computing. We provide the first round-efficient broadcast
algorithms against adaptive edge adversaries. Our two key results for -node
graphs of diameter are as follows:
1. For , there is a deterministic algorithm that solves the problem
within rounds, provided that the graph is 3
edge-connected. This round complexity beats the natural barrier of
rounds, the existential lower bound on the maximal length of edge-disjoint
paths between a given pair of nodes in . This algorithm can be extended to a
-round algorithm against adversarial edges in
edge-connected graphs.
2. For expander graphs with minimum degree of , there is
an improved broadcast algorithm with rounds against
adversarial edges. This algorithm exploits the connectivity and conductance
properties of G-subgraphs obtained by employing the Karger's edge sampling
technique.
Our algorithms mark a new connection between the areas of fault-tolerant
network design and reliable distributed communication.Comment: accepted to DISC2
The Complexity of Symmetry Breaking in Massive Graphs
The goal of this paper is to understand the complexity of symmetry breaking problems, specifically maximal independent set (MIS) and the closely related beta-ruling set problem, in two computational models suited for large-scale graph processing, namely the k-machine model and the graph streaming model. We present a number of results. For MIS in the k-machine model, we improve the O~(m/k^2 + Delta/k)-round upper bound of Klauck et al. (SODA 2015) by presenting an O~(m/k^2)-round algorithm. We also present an Omega~(n/k^2) round lower bound for MIS, the first lower bound for a symmetry breaking problem in the k-machine model. For beta-ruling sets, we use hierarchical sampling to obtain more efficient algorithms in the k-machine model and also in the graph streaming model. More specifically, we obtain a k-machine algorithm that runs in O~(beta n Delta^{1/beta}/k^2) rounds and, by using a similar hierarchical sampling technique, we obtain one-pass algorithms for both insertion-only and insertion-deletion streams that use O(beta * n^{1+1/2^{beta-1}}) space. The latter result establishes a clear separation between MIS, which is known to require Omega(n^2) space (Cormode et al., ICALP 2019), and beta-ruling sets, even for beta = 2. Finally, we present an even faster 2-ruling set algorithm in the k-machine model, one that runs in O~(n/k^{2-epsilon} + k^{1-epsilon}) rounds for any epsilon, 0 <=epsilon <=1. For a wide range of values of k this round complexity simplifies to O~(n/k^2) rounds, which we conjecture is optimal.
Our results use a variety of techniques. For our upper bounds, we prove and use simulation theorems for beeping algorithms, hierarchical sampling, and L_0-sampling, whereas for our lower bounds we use information-theoretic arguments and reductions to 2-party communication complexity problems
Performance Analysis and Modelling of Concurrent Multi-access Data Structures
The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system. We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures
A Simplicial Model for : Epistemic Logic with Agents that May Die
The standard semantics of multi-agent epistemic logic S5 is based on Kripke
models whose accessibility relations are reflexive, symmetric and transitive.
This one dimensional structure contains implicit higher-dimensional information
beyond pairwise interactions, that we formalized as pure simplicial models in a
previous work (Information and Computation, 2021). Here we extend the theory to
encompass simplicial models that are not necessarily pure. The corresponding
class of Kripke models are those where the accessibility relation is symmetric
and transitive, but might not be reflexive. Such models correspond to the
epistemic logic KB4 . Impure simplicial models arise in situations where two
possible worlds may not have the same set of agents. We illustrate it with
distributed computing examples of synchronous systems where processes may
crash
An Almost Singularly Optimal Asynchronous Distributed MST Algorithm
A singularly (near) optimal distributed algorithm is one that is (near)
optimal in \emph{two} criteria, namely, its time and message complexities. For
\emph{synchronous} CONGEST networks, such algorithms are known for fundamental
distributed computing problems such as leader election [Kutten et al., JACM
2015] and Minimum Spanning Tree (MST) construction [Pandurangan et al., STOC
2017, Elkin, PODC 2017]. However, it is open whether a singularly (near)
optimal bound can be obtained for the MST construction problem in general
\emph{asynchronous} CONGEST networks.
We present a randomized distributed MST algorithm that, with high
probability, computes an MST in \emph{asynchronous} CONGEST networks and takes
time and messages, where
is the number of nodes, the number of edges, is the diameter of the
network, and is an arbitrarily small constant (both time and
message bounds hold with high probability). Our algorithm is message optimal
(up to a polylog factor) and almost time optimal (except for a
factor). Our result answers an open question raised in Mashregi
and King [DISC 2019] by giving the first known asynchronous MST algorithm that
has sublinear time (for all ) and uses
messages. Using a result of Mashregi and King [DISC 2019], this also yields the
first asynchronous MST algorithm that is sublinear in both time and messages in
the CONGEST model.
A key tool in our algorithm is the construction of a low diameter rooted
spanning tree in asynchronous CONGEST that has depth
(for an arbitrarily small constant )
in time and messages. To the best of
our knowledge, this is the first such construction that is almost singularly
optimal in the asynchronous setting.Comment: 27 pages, accepted to DISC 202
- …