11,513 research outputs found
Ancestral dynamic voting algorithm for mutual exclusion in partitioned distributed systems
Data replication is a known redundancy used in fault-tolerant distributed system. However, it has the problem of mutual exclusion of replicated data. Mutual exclusion becomes difficult when a distributed system is partitioned into two or more isolated groups of sites. In this study, a new dynamic algorithm is presented as a solution for mutual exclusion in partitioned distributed systems. The correctness of the algorithm is proven, and simulation is utilized for availability analysis. Simulations show that the new algorithm, ancestral dynamic voting algorithm, improves the availability and lifetime of service in faulty environments, regardless of the number of sites and topology of the system. This algorithm also prolongs the lifetime of service to mutual exclusion for full and partial topologies especially for the situations where there is no majority. Furthermore, it needs less number of messages transmitted. Finally, it is simple and easy to implement
Dynamic sharing of a multiple access channel
In this paper we consider the mutual exclusion problem on a multiple access
channel. Mutual exclusion is one of the fundamental problems in distributed
computing. In the classic version of this problem, n processes perform a
concurrent program which occasionally triggers some of them to use shared
resources, such as memory, communication channel, device, etc. The goal is to
design a distributed algorithm to control entries and exits to/from the shared
resource in such a way that in any time there is at most one process accessing
it. We consider both the classic and a slightly weaker version of mutual
exclusion, called ep-mutual-exclusion, where for each period of a process
staying in the critical section the probability that there is some other
process in the critical section is at most ep. We show that there are channel
settings, where the classic mutual exclusion is not feasible even for
randomized algorithms, while ep-mutual-exclusion is. In more relaxed channel
settings, we prove an exponential gap between the makespan complexity of the
classic mutual exclusion problem and its weaker ep-exclusion version. We also
show how to guarantee fairness of mutual exclusion algorithms, i.e., that each
process that wants to enter the critical section will eventually succeed
The Raincore Distributed Session Service for Networking Elements
Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer
protocols for networking elements. These protocols are
designed to enable a network cluster to share the state
information necessary for balancing network traffic and
computation load among a group of networking elements.
In addition, in the presence of failures, they allow
network traffic to fail-over from failed networking
elements to healthy ones. To maximize the overall
network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity’s first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals
The Raincore API for clusters of networking elements
Clustering technology offers a way to increase overall reliability and performance of Internet information flow by strengthening one link in the chain without adding others. We have implemented this technology in a distributed computing architecture for network elements. The architecture, called Raincore, originated in the Reliable Array of Independent Nodes, or RAIN, research collaboration between the California Institute of Technology and the US National Aeronautics and Space Agency's Jet Propulsion Laboratory. The RAIN project focused on developing high-performance, fault-tolerant, portable clustering technology for spaceborne computing . The technology that emerged from this project became the basis for a spinoff company, Rainfinity, which has the exclusive intellectual property rights to the RAIN technology. The authors describe the Raincore conceptual architecture and distributed services, which are designed to make it easy for developers to port their applications to run on top of a cluster of networking elements. We include two applications: a Web server prototype that was part of the original RAIN research project and a commercial firewall cluster product from Rainfinity
A Modeling Framework for Schedulability Analysis of Distributed Avionics Systems
This paper presents a modeling framework for schedulability analysis of
distributed integrated modular avionics (DIMA) systems that consist of
spatially distributed ARINC-653 modules connected by a unified AFDX network. We
model a DIMA system as a set of stopwatch automata (SWA) in UPPAAL to analyze
its schedulability by classical model checking (MC) and statistical model
checking (SMC). The framework has been designed to enable three types of
analysis: global SMC, global MC, and compositional MC. This allows an effective
methodology including (1) quick schedulability falsification using global SMC
analysis, (2) direct schedulability proofs using global MC analysis in simple
cases, and (3) strict schedulability proofs using compositional MC analysis for
larger state space. The framework is applied to the analysis of a concrete DIMA
system.Comment: In Proceedings MARS/VPT 2018, arXiv:1803.0866
Self-stabilizing tree algorithms
Designers of distributed algorithms have to contend with the problem of making the algorithms tolerant to several forms of coordination loss, primarily faulty initialization. The processes in a distributed system do not share a global memory and can only get a partial view of the global state. Transient failures in one part of the system may go unnoticed in other parts and thus cause the system to go into an illegal state. If the system were self-stabilizing, however, it is guaranteed that it will return to a legal state after a finite number of state transitions. This thesis presents and proves self-stabilizing algorithms for calculating tree metrics and for achieving mutual exclusion on a tree structured distributed system
- …