186 research outputs found
Leader Election for Anonymous Asynchronous Agents in Arbitrary Networks
We study the problem of leader election among mobile agents operating in an
arbitrary network modeled as an undirected graph. Nodes of the network are
unlabeled and all agents are identical. Hence the only way to elect a leader
among agents is by exploiting asymmetries in their initial positions in the
graph. Agents do not know the graph or their positions in it, hence they must
gain this knowledge by navigating in the graph and share it with other agents
to accomplish leader election. This can be done using meetings of agents, which
is difficult because of their asynchronous nature: an adversary has total
control over the speed of agents. When can a leader be elected in this
adversarial scenario and how to do it? We give a complete answer to this
question by characterizing all initial configurations for which leader election
is possible and by constructing an algorithm that accomplishes leader election
for all configurations for which this can be done
Move-optimal partial gathering of mobile agents in asynchronous trees
In this paper, we consider the partial gathering problem of mobile agents in asynchronous tree networks. The partial gathering problem is a generalization of the classical gathering problem, which requires that all the agents meet at the same node. The partial gathering problem requires, for a given positive integer g, that each agent should move to a node and terminate so that at least g agents should meet at each of the nodes they terminate at. The requirement for the partial gathering problem is weaker than that for the (well-investigated) classical gathering problem, and thus, we clarify the difference on the move complexity between them. We consider two multiplicity detection models: weak multiplicity detection and strong multiplicity detection models. In the weak multiplicity detection model, each agent can detect whether another agent exists at the current node or not but cannot count the exact number of the agents. In the strong multiplicity detection model, each agent can count the number of agents at the current node. In addition, we consider two token models: non-token model and removable token model. In the non-token model, agents cannot mark the nodes or the edges in any way. In the removable-token model, each agent initially leaves a token on its initial node, and agents can remove the tokens. Our contribution is as follows. First, we show that for the non-token model agents require Ω(kn) total moves to solve the partial gathering problem, where n is the number of nodes and k is the number of agents. Second, we consider the weak multiplicity detection and non-token model. In this model, for asymmetric trees, by a previous result agents can achieve the partial gathering in O(kn) total moves, which is asymptotically optimal in terms of total moves. In addition, for symmetric trees we show that there exist no algorithms to solve the partial gathering problem. Third, we consider the strong multiplicity detection and non-token model. In this model, for any trees we propose an algorithm to achieve the partial gathering in O(kn) total moves, which is asymptotically optimal in terms of total moves. At last, we consider the weak multiplicity detection and removable-token model. In this model, we propose an algorithm to achieve the partial gathering in O(gn) total moves. Note that in this model, agents require Ω(gn) total moves to solve the partial gathering problem. Hence, the second proposed algorithm is also asymptotically optimal in terms of total moves
Technical Report: Using Static Analysis to Compute Benefit of Tolerating Consistency
Synchronization is the Achilles heel of concurrent programs. Synchronization
requirement is often used to ensure that the execution of the concurrent
program can be serialized. Without synchronization requirement, a program
suffers from consistency violations. Recently, it was shown that if programs
are designed to tolerate such consistency violation faults (\cvf{s}) then one
can obtain substantial performance gain. Previous efforts to analyze the effect
of \cvf-tolerance are limited to run-time analysis of the program to determine
if tolerating \cvf{s} can improve the performance. Such run-time analysis is
very expensive and provides limited insight.
In this work, we consider the question, `Can static analysis of the program
predict the benefit of \cvf-tolerance?' We find that the answer to this
question is affirmative. Specifically, we use static analysis to evaluate the
cost of a \cvf and demonstrate that it can be used to predict the benefit of
\cvf-tolerance. We also find that when faced with a large state space, partial
analysis of the state space (via sampling) also provides the required
information to predict the benefit of \cvf-tolerance. Furthermore, we observe
that the \cvf-cost distribution is exponential in nature, i.e., the probability
that a \cvf has a cost of is , where and are constants,
i.e., most \cvf{s} cause no/low perturbation whereas a small number of \cvf{s}
cause a large perturbation. This opens up new aveneus to evaluate the benefit
of \cvf-tolerance
Automated Synthesis of Timed and Distributed Fault-Tolerant Systems
This dissertation concentrates on the problem of automated synthesis and repair of fault-tolerant systems. In particular, given the required specification of the system, our goal is to synthesize a fault-tolerant system, or repair an existing one. We study this problem for two classes of timed and distributed systems.
In the context of timed systems, we focus on efficient synthesis of fault-tolerant timed models from their fault-intolerant version. Although the complexity of the synthesis problem is known to be polynomial time in the size of the time-abstract bisimulation of the input model, the state of the art lacked synthesis
algorithms that can be efficiently implemented. This is in part due to the fact that synthesis is in general a
challenging problem and its complexity is significantly magnified in the context of timed systems. We
propose an algorithm that takes a timed automaton, a set of fault actions, and a set of safety and bounded-time response properties as input, and utilizes a space-efficient symbolic representation of the timed
automaton (called the zone graph) to synthesize a fault-tolerant timed automaton as output. The output
automaton satisfies strict phased recovery, where it is guaranteed that the output model behaves similarly
to the input model in the absence of faults and in the presence of faults, fault recovery is achieved in two
phases, each satisfying certain safety and timing constraints.
In the context of distributed systems, we study the problem of synthesizing fault-tolerant systems from their
intolerant versions, when the number of processes is unknown. To synthesize a distributed fault-tolerant
protocol that works for systems with any number of processes, we use counter abstraction. Using this
abstraction, we deal with a finite-state abstract model to do the synthesis. Applying our proposed algorithm,
we successfully synthesized a fault-tolerant distributed agreement protocol in the presence of Byzantine fault. Although the synthesis problem is known to be NP-complete in the state space of the input
protocol (due to partial observability of processes) in the non-parameterized setting, our parameterized
algorithm manages to synthesize a solution for a complex problem such as Byzantine agreement within less than two minutes.
A system may reach a bad state due to wrong initialization or fault occurrence. One of the well-known
types of distributed fault-tolerant systems are self-stabilizing systems. These are the systems that converge
to their legitimate states starting from any state, and if no fault occurs, stay in legitimate states thereafter.
We propose an automated sound and complete method to synthesize self-stabilizing systems starting from
the desired topology and type of the system. Our proposed method is based on SMT-solving, where the
desired specification of the system is formulated as SMT constraints. We used the Alloy solver to
implement our method, and successfully synthesized some of the well-known self-stabilizing algorithms.
We extend our method to support a type of stabilizing algorithm called ideal-stabilization, and also the case
when the set of legitimate states is not explicitly known.
Quantitative metrics such as recovery time are crucial in self-stabilizing systems when used in practice
(such as in networking applications). One of these metrics is the average recovery time. Our automated
method for synthesizing self-stabilizing systems generate some solution that respects the desired system
specification, but it does not take into account any quantitative metrics. We study the problem of repairing
self-stabilizing systems (where only removal of transitions is allowed) to satisfy quantitative limitations.
The metric under study is average recovery time, which characterizes the performance of stabilizing
programs. We show that the repair problem is NP-complete in the state space of the given system
Notes on Theory of Distributed Systems
Notes for the Yale course CPSC 465/565 Theory of Distributed Systems
Mesh-Mon: a Monitoring and Management System for Wireless Mesh Networks
A mesh network is a network of wireless routers that employ multi-hop routing and can be used to provide network access for mobile clients. Mobile mesh networks can be deployed rapidly to provide an alternate communication infrastructure for emergency response operations in areas with limited or damaged infrastructure. In this dissertation, we present Dart-Mesh: a Linux-based layer-3 dual-radio two-tiered mesh network that provides complete 802.11b coverage in the Sudikoff Lab for Computer Science at Dartmouth College. We faced several challenges in building, testing, monitoring and managing this network. These challenges motivated us to design and implement Mesh-Mon, a network monitoring system to aid system administrators in the management of a mobile mesh network. Mesh-Mon is a scalable, distributed and decentralized management system in which mesh nodes cooperate in a proactive manner to help detect, diagnose and resolve network problems automatically. Mesh-Mon is independent of the routing protocol used by the mesh routing layer and can function even if the routing protocol fails. We demonstrate this feature by running Mesh-Mon on two versions of Dart-Mesh, one running on AODV (a reactive mesh routing protocol) and the second running on OLSR (a proactive mesh routing protocol) in separate experiments. Mobility can cause links to break, leading to disconnected partitions. We identify critical nodes in the network, whose failure may cause a partition. We introduce two new metrics based on social-network analysis: the Localized Bridging Centrality (LBC) metric and the Localized Load-aware Bridging Centrality (LLBC) metric, that can identify critical nodes efficiently and in a fully distributed manner. We run a monitoring component on client nodes, called Mesh-Mon-Ami, which also assists Mesh-Mon nodes in the dissemination of management information between physically disconnected partitions, by acting as carriers for management data. We conclude, from our experimental evaluation on our 16-node Dart-Mesh testbed, that our system solves several management challenges in a scalable manner, and is a useful and effective tool for monitoring and managing real-world mesh networks
- …