285 research outputs found
Counter Attack on Byzantine Generals: Parameterized Model Checking of Fault-tolerant Distributed Algorithms
We introduce an automated parameterized verification method for
fault-tolerant distributed algorithms (FTDA). FTDAs are parameterized by both
the number of processes and the assumed maximum number of Byzantine faulty
processes. At the center of our technique is a parametric interval abstraction
(PIA) where the interval boundaries are arithmetic expressions over parameters.
Using PIA for both data abstraction and a new form of counter abstraction, we
reduce the parameterized problem to finite-state model checking. We demonstrate
the practical feasibility of our method by verifying several variants of the
well-known distributed algorithm by Srikanth and Toueg. Our semi-decision
procedures are complemented and motivated by an undecidability proof for FTDA
verification which holds even in the absence of interprocess communication. To
the best of our knowledge, this is the first paper to achieve parameterized
automated verification of Byzantine FTDA
Deep Space Network information system architecture study
The purpose of this article is to describe an architecture for the Deep Space Network (DSN) information system in the years 2000-2010 and to provide guidelines for its evolution during the 1990s. The study scope is defined to be from the front-end areas at the antennas to the end users (spacecraft teams, principal investigators, archival storage systems, and non-NASA partners). The architectural vision provides guidance for major DSN implementation efforts during the next decade. A strong motivation for the study is an expected dramatic improvement in information-systems technologies, such as the following: computer processing, automation technology (including knowledge-based systems), networking and data transport, software and hardware engineering, and human-interface technology. The proposed Ground Information System has the following major features: unified architecture from the front-end area to the end user; open-systems standards to achieve interoperability; DSN production of level 0 data; delivery of level 0 data from the Deep Space Communications Complex, if desired; dedicated telemetry processors for each receiver; security against unauthorized access and errors; and highly automated monitor and control
Programming Languages for Distributed Computing Systems
When distributed systems first appeared, they were programmed in traditional sequential languages, usually with the addition of a few library procedures for sending and receiving messages. As distributed applications became more commonplace and more sophisticated, this ad hoc approach became less satisfactory. Researchers all over the world began designing new programming languages specifically for implementing distributed applications. These languages and their history, their underlying principles, their design, and their use are the subject of this paper. We begin by giving our view of what a distributed system is, illustrating with examples to avoid confusion on this important and controversial point. We then describe the three main characteristics that distinguish distributed programming languages from traditional sequential languages, namely, how they deal with parallelism, communication, and partial failures. Finally, we discuss 15 representative distributed languages to give the flavor of each. These examples include languages based on message passing, rendezvous, remote procedure call, objects, and atomic transactions, as well as functional languages, logic languages, and distributed data structure languages. The paper concludes with a comprehensive bibliography listing over 200 papers on nearly 100 distributed programming languages
Interprocess communication in highly distributed systems
Issued as Final technical report, Project no. G-36-632Final technical report has title: Interprocess communication in highly distributed system
State of the art survey of network operating systems development
The results of the State-of-the-Art Survey of Network Operating Systems (NOS) performed for Goddard Space Flight Center are presented. NOS functional characteristics are presented in terms of user communication data migration, job migration, network control, and common functional categories. Products (current or future) as well as research and prototyping efforts are summarized. The NOS products which are revelant to the space station and its activities are evaluated
Redundancy management for efficient fault recovery in NASA's distributed computing system
The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance
Space Complexity of Fault-Tolerant Register Emulations
Driven by the rising popularity of cloud storage, the costs associated with
implementing reliable storage services from a collection of fault-prone servers
have recently become an actively studied question. The well-known ABD result
shows that an f-tolerant register can be emulated using a collection of 2f + 1
fault-prone servers each storing a single read-modify-write object type, which
is known to be optimal. In this paper we generalize this bound: we investigate
the inherent space complexity of emulating reliable multi-writer registers as a
fucntion of the type of the base objects exposed by the underlying servers, the
number of writers to the emulated register, the number of available servers,
and the failure threshold. We establish a sharp separation between registers,
and both max-registers (the base object types assumed by ABD) and CAS in terms
of the resources (i.e., the number of base objects of the respective types)
required to support the emulation; we show that no such separation exists
between max-registers and CAS. Our main technical contribution is lower and
upper bounds on the resources required in case the underlying base objects are
fault-prone read/write registers. We show that the number of required registers
is directly proportional to the number of writers and inversely proportional to
the number of servers.Comment: Conference version appears in Proceedings of PODC '1
- …