8,884 research outputs found
Counter Attack on Byzantine Generals: Parameterized Model Checking of Fault-tolerant Distributed Algorithms
We introduce an automated parameterized verification method for
fault-tolerant distributed algorithms (FTDA). FTDAs are parameterized by both
the number of processes and the assumed maximum number of Byzantine faulty
processes. At the center of our technique is a parametric interval abstraction
(PIA) where the interval boundaries are arithmetic expressions over parameters.
Using PIA for both data abstraction and a new form of counter abstraction, we
reduce the parameterized problem to finite-state model checking. We demonstrate
the practical feasibility of our method by verifying several variants of the
well-known distributed algorithm by Srikanth and Toueg. Our semi-decision
procedures are complemented and motivated by an undecidability proof for FTDA
verification which holds even in the absence of interprocess communication. To
the best of our knowledge, this is the first paper to achieve parameterized
automated verification of Byzantine FTDA
A computer scientist looks at game theory
I consider issues in distributed computation that should be of relevance to
game theory. In particular, I focus on (a) representing knowledge and
uncertainty, (b) dealing with failures, and (c) specification of mechanisms.Comment: To appear, Games and Economic Behavior. JEL classification numbers:
D80, D8
Replication and fault-tolerance in real-time systems
PhD ThesisThe increased availability of sophisticated computer hardware and the corresponding
decrease in its cost has led to a widespread growth in the use of computer systems for realtime
plant and process control applications. Such applications typically place very high
demands upon computer control systems and the development of appropriate control
software for these application areas can present a number of problems not normally
encountered in other applications.
First of all, real-time applications must be correct in the time domain as well as the value
domain: returning results which are not only correct but also delivered on time. Further,
since the potential for catastrophic failures can be high in a process or plant control
environment, many real-time applications also have to meet high reliability requirements.
These requirements will typically be met by means of a combination of fault avoidance and
fault tolerance techniques.
This thesis is intended to address some of the problems encountered in the provision of fault
tolerance in real-time applications programs. Specifically,it considers the use of replication
to ensure the availability of services in real-time systems. In a real-time environment,
providing support for replicated services can introduce a number of problems. In particular,
the scope for non-deterministic behaviour in real-time applications can be quite large and
this can lead to difficultiesin maintainingconsistent internal states across the members of a
replica group. To tackle this problem, a model is proposed for fault tolerant real-time
objects which not only allows such objects to perform application specific recovery
operations and real-time processing activities such as event handling, but which also allows
objects to be replicated. The architectural support required for such replicated objects is
also discussed and, to conclude, the run-time overheads associated with the use of such
replicated services are considered.The Science and Engineering Research Council
Application Agreement and Integration Services
Application agreement and integration services are required by distributed, fault-tolerant, safety critical systems to assure required performance. An analysis of distributed and hierarchical agreement strategies are developed against the backdrop of observed agreement failures in fielded systems. The documented work was performed under NASA Task Order NNL10AB32T, Validation And Verification of Safety-Critical Integrated Distributed Systems Area 2. This document is intended to satisfy the requirements for deliverable 5.2.11 under Task 4.2.2.3. This report discusses the challenges of maintaining application agreement and integration services. A literature search is presented that documents previous work in the area of replica determinism. Sources of non-deterministic behavior are identified and examples are presented where system level agreement failed to be achieved. We then explore how TTEthernet services can be extended to supply some interesting application agreement frameworks. This document assumes that the reader is familiar with the TTEthernet protocol. The reader is advised to read the TTEthernet protocol standard [1] before reading this document. This document does not re-iterate the content of the standard
Recommended from our members
Replicating multithreaded services
textFor the last 40 years, the systems community has invested a lot of effort in designing techniques for building fault tolerant distributed systems and services. This effort has produced a massive list of results: the literature describes how to design replication protocols that tolerate a wide range of failures (from simple crashes to malicious "Byzantine" failures) in a wide range of settings (e.g. synchronous or asynchronous communication, with or without stable storage), optimizing various metrics (e.g. number of messages, latency, throughput). These techniques have their roots in ideas, such as the abstraction of State Machine Replication and the Paxos protocol, that were conceived when computing was very different than it is today: computers had a single core; all processing was done using a single thread of control, handling requests sequentially; and a collection of 20 nodes was considered a large distributed system. In the last decade, however, computing has gone through some major paradigm shifts, with the advent of multicore architectures and large cloud infrastructures. This dissertation explains how these profound changes impact the practical usefulness of traditional fault tolerant techniques and proposes new ways to architect these solutions to fit the new paradigms.Computer Science
Design and development of algorithms for fault tolerant distributed systems
PhD ThesisThis thesis describes the design and development of algorithms for fault
tolerant distributed systems. The development of such algorithms requires
making assumptions about the types of component faults for which toler-
ance is to be provided. Such assumptions must be specified accurately. To
this end, this thesis develops a classification of faults in systems. This fault
classification identifies a range of fault types from the most restricted to the
least restricted. For each fault type, an algorithm for reaching distributed
agreement in the presence of a bounded number of faulty processors is
developed, and thus a family of agreement algorithms is presented. The
influence of the various fault types on the complexities of these algorithms
is discussed. Early stopping algorithms are also developed for selected fault
types and the influence of fault types on the early stopping conditions of the
respective algorithms is analysed. The problem of evaluating the perfor-
mance of distributed replicated systems which will require agreement algo-
rithms is considered next. As a first step in the direction of meeting this
challenging task, a pipeline triple modular redundant system is considered
and analytical methods are derived to evaluate the performance of such a
system. Finally, the accuracy of these methods is examined using computer
simulations.UK Science and Engineering Research Council (SERC),
DELTA-4 consortium of ESPIRI
Services for safety-critical applications on dual-scheduled TDMA networks
Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
- …