Search CORE

25 research outputs found

Detection of global state predicates

Author: Marzullo Keith
Neiger Gil
Publication venue
Publication date
Field of study

The problem addressed here arises in the context of Meta: how can a set of processes monitor the state of a distributed application in a consistent manner? For example, consider the simple distributed application as shown here. Each of the three processes in the application has a light, and the control processes would each like to take an action when some specified subset of the lights are on. The application processes are instrumented with stubs that determine when the process turns its lights on or off. This information is disseminated to the control processes, each of which then determines when its condition of interest is met. Meta is built on top of the ISIS toolkit, and so we first built the sensor dissemination mechanism using atomic broadcast. Atomic broadcast guarantees that all recipients receive the messages in the same order and that this order is consistent with causality. Unfortunately, the control processes are somewhat limited in what they can deduce when they find that their condition of interest holds

NASA Technical Reports Server

Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory

Author: Alglave Jade
Bertot Yves
Boudol Gérard
Burckhardt Sebastian
Collier William
Compaq Computer Corp. 2002.
Grisenthwaite Richard
Howells David
IBM Corp. 2009.
Intel Corp. 2002.
Intel Corp. 2009.
Kuperstein Michael
Ltd ARM
Ltd ARM
Nardelli Francesco Zappa
Neiger Gil
Paul
SPARC International Inc. 1992.
SPARC International Inc. 1994.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We propose an axiomatic generic framework for modelling weak memory. We show how to instantiate this framework for SC, TSO, C++ restricted to release-acquire atomics, and Power. For Power, we compare our model to a preceding operational model in which we found a flaw. To do so, we define an operational model that we show equivalent to our axiomatic model. We also propose a model for ARM. Our testing on this architecture revealed a behaviour later acknowl-edged as a bug by ARM, and more recently 31 additional anomalies. We offer a new simulation tool, called herd, which allows the user to specify the model of his choice in a concise way. Given a specification of a model, the tool becomes a simulator for that model. The tool relies on an axiomatic description; this choice allows us to outperform all previous simulation tools. Additionally, we confirm that verification time is vastly improved, in the case of bounded model checking. Finally, we put our models in perspective, in the light of empirical data obtained by analysing the C and C++ code of a Debian Linux distribution. We present our new analysis tool, called mole, which explores a piece of code to find the weak memory idioms that it uses

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Queen Mary Research Online

Detection of Global State Predicates

Author: Gil Neiger
Gil Neiger
Keith Marzullo
Keith Marzullo
Publication venue: Springer-Verlag
Publication date: 01/01/1991
Field of study

those of the authors and should not be construed as an official Department of Defense position, policy, or decision. Author's address: Cornell Universit

CiteSeerX

Distributed Consensus Revisited

Author: Neiger Gil
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1993
Field of study

Distributed Consensus is a classical problem in distributed computing. It requires the correct processors in a distributed system to agree on a common value despite the failure of other processors. This problem is closely related to other problems, such as Byzantine Generals, Approximate Agreement, and k-Set Agreement. This paper examines a variant of Distributed Consensus that considers agreement on a value that is more than a single bit and requires that the agreed upon value be one of the correct processors' input values. It shows that, for this problem to be solved in a system with arbitrary failures, it is necessary that more processors remain correct than for solutions to Distributed Consensus and for cases where agreement is only a single bit. Specifically, the number of processors that must be correct is a function of the size of the domain of values used. Two existing consensus algorithms are modified to solve this stronger variant

Scholarly Materials And Research @ Georgia Tech

CiteSeerX

Space-Efficient Atomic Snapshots in Synchronous Systems

Author: Neiger Gil
Singh Ranu
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1993
Field of study

We consider the problem of implementing an atomic snapshot memory in synchronous distributed systems. An atomic snapshot memory is an array of memory locations, one per processor. Each processor may update its own location or scan all locations atomically. We are interested in implementations that are space-efficient in the sense that they are honest. This means that the implementation may use no more shared memory than that of the array being implemented and that the memory truly reflect the contents of that array. If n is the number of processors involved, then the worst-case scanning time must be at least n. We show that the sum of the worst-case update and scanning times must be greater than floor(3n/2). We exhibit two honest implementations. One has scans and updates with worst-case times of n+1 for both operations; for scans, this is near the lower bound. The other requires longer scans (with worst-case time ceiling(3n/2)+1) but shorter updates (with worst-case time ceiling(n/2)+1). Thus, both implementations have the sum of the worst-case times at 2n + O(1), which is within n/2 of the lower bound. Closing the gap between these algorithms and the combined lower bound remains an open problem

Scholarly Materials And Research @ Georgia Tech

CiteSeerX

Automatically increasing the fault-tolerance of distributed algorithms

Author: Gil Neiger
Sam Toueg
Publication venue
Publication date: 01/01/1990
Field of study

The design of fault-tolerant distributed systems is a costly and diflicult task. Its cost and difficulty increase dramatically with the severity of failures that a system must tolerate. We seek to simplify this task by developing methods to automatically translate protocols tolerant of “benign ” failures to ones tolerant of more “severe” failures. This paper describes two new translation mechanisms for qr~hronous systems; one translates programs tolerant of crash failures into programs tolerant of general omission failures, and the other translates from gene& omiesion failures to arbitrary failures. Together these can be used to translate any program tolerant of the most benign failures to a program tolerant of the most severe.

CiteSeerX

eCommons@Cornell

Simplifying Fault-Tolerance: Providing the Abstraction of Crash Failures

Author: Bazzi Rida Adnan
Neiger Gil
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1993
Field of study

The difficulty of designing fault-tolerant distributed algorithms increases with the severity of failures that an algorithm must tolerate. This paper considers methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe failures. These translations simplify the design task by allowing algorithm designers to assume that processors fail only by stopping. Such translations can be quantified by two measures: fault-tolerance, which is a measure of how many processors must remain nonfaulty for the translation to be correct, and round-complexity, which is a measure of how the translation increases the running time of an algorithm. Understanding these translations and their limitations with respect to these measures can provide insight into the relative impact of different models of faulty behavior on the ability to provide fault-tolerant applications. This paper considers two classes of translations from crash failures to each of the following types of more severe failures: omission to send messages; omission to send and receive messages; and totally arbitrary behavior. It shows that previously developed translations to send-omission failures are optimal with respect to both fault-tolerance and round-complexity. It exhibits a hierarchy of translations to general (send/receive) omissions that improves upon the fault-tolerance of previously developed translations. It also gives a series of translations to arbitrary failures that improves upon the round-complexity of previously developed translations. All translations developed in this paper are shown to be optimal in that they cannot be improved with respect to one measure without negatively affecting the other; that is, both hierarchies of translations are matched by corresponding hierarchies of impossibility results

Scholarly Materials And Research @ Georgia Tech

CiteSeerX

Using Knowledge to Optimally Achieve Coordination in Distributed Systems

Author: Bazzi Rida Adnan
Neiger Gil
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1993
Field of study

The problem of coordinating the actions of individual processors is fundamental in distributed computing. Researchers have long endeavored to find efficient solutions to a variety of problems involving coordination. Recently, processor knowledge has been used to characterize such solutions and to derive more efficient ones. Most of this work has concentrated on the relationship between common knowledge and simultaneous coordination. This paper takes an alternative approach, considering problems in which coordinated actions need not be performed simultaneously. This approach permits better understanding of the relationship between knowledge and the different requirements of coordination problems. This paper defines the ideas of optimal and optimum solutions to a coordination problem and precisely characterizes the problems for which optimum solutions exist. This characterization is based on combinations of eventual common knowledge and continual common knowledge. The paper then considers more general problems, for which optimal, but no optimum, solutions exist. It defines a new form of knowledge, called extended common knowledge, which combines eventual and continual knowledge, and shows how extended common knowledge can be used to both characterize and construct optimal protocols for coordination

Scholarly Materials And Research @ Georgia Tech

CiteSeerX

The Complexity of Almost-Optimal Coordination

Author: Bazzi Rida Adnan
Neiger Gil
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1993
Field of study

The problem of fault-tolerant coordination is fundamental in distributed computing. In the past, researchers have considered the complexity of achieving optimal simultaneous coordination under various failure assumptions. This paper studies the complexity of achieving simultaneous coordination in synchronous systems in the presence of send/receive omission failures. It had been shown earlier that achieving optimal simultaneous coordination in these systems requires NP-hard local computation. In this paper, we study almost-optimal coordination, which requires processors to coordinate within a constant additive or multiplicative number of rounds of the coordination time of an optimal protocol. We show that achieving almost-optimal coordination also requires NP-hard computation

Scholarly Materials And Research @ Georgia Tech

CiteSeerX