Search CORE

5,647 research outputs found

A group membership algorithm with a practical specification

Author: Bruck Jehoshua
Franceschetti Martin
Publication venue
Publication date: 01/11/2001
Field of study

Presents a solvable specification and gives an algorithm for the group membership problem in asynchronous systems with crash failures. Our specification requires processes to maintain a consistent history in their sequences of views. This allows processes to order failures and recoveries in time and simplifies the programming of high level applications. Previous work has proven that the group membership problem cannot be solved in asynchronous systems with crash failures. We circumvent this impossibility result building a weaker, yet nontrivial specification. We show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification. We also relate our solution to other methods and give a classification of progress properties that can be achieved under different models

Caltech Authors

Process membership in asynchronous environments

Author: Birman Kenneth P.
Ricciardi Aleta M.
Publication venue
Publication date: 01/02/1993
Field of study

The development of reliable distributed software is simplified by the ability to assume a fail-stop failure model. The emulation of such a model in an asynchronous distributed environment is discussed. The solution proposed, called Strong-GMP, can be supported through a highly efficient protocol, and was implemented as part of a distributed systems software project at Cornell University. The precise definition of the problem, the protocol, correctness proofs, and an analysis of costs are addressed

NASA Technical Reports Server

eCommons@Cornell

The WorkPlace distributed processing environment

Author: Ames Troy
Henderson Scott
Publication venue
Publication date
Field of study

Real time control problems require robust, high performance solutions. Distributed computing can offer high performance through parallelism and robustness through redundancy. Unfortunately, implementing distributed systems with these characteristics places a significant burden on the applications programmers. Goddard Code 522 has developed WorkPlace to alleviate this burden. WorkPlace is a small, portable, embeddable network interface which automates message routing, failure detection, and re-configuration in response to failures in distributed systems. This paper describes the design and use of WorkPlace, and its application in the construction of a distributed blackboard system

NASA Technical Reports Server

The Raincore Distributed Session Service for Networking Elements

Author: Bruck Jehoshua
Fan Chenggong Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2001
Field of study

Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer protocols for networking elements. These protocols are designed to enable a network cluster to share the state information necessary for balancing network traffic and computation load among a group of networking elements. In addition, in the presence of failures, they allow network traffic to fail-over from failed networking elements to healthy ones. To maximize the overall network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity’s first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals

Caltech Authors

Maintaining consistency in distributed systems

Author: Birman Kenneth P.
Publication venue
Publication date: 01/01/1991
Field of study

In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability

CiteSeerX

NASA Technical Reports Server

eCommons@Cornell

Doing-it-All with Bounded Work and Communication

Author: Alistarh
Alistarh
Alon
Birman
Birman
Bridgland
Cachin
Censor-Hillel
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chlebus
Chung
Clementi
Davidoff
Davtyan
Davtyan
Davtyan
De Prisco
Diks
Drucker
Dwork
Dwork
Fernández
Galil
Georgiou
Georgiou
Georgiou
Georgiou
Georgiou
Georgiou
Goldberg
Kanellakis
Kentros
Kentros
Kentros
Kontogiannis
Kowalski
Kowalski
Lamport
Lubotzky
Margulis
Mitzenmacher
Pippenger
Saks
Tanner
Upfal
Publication venue: 'Elsevier BV'
Publication date: 01/06/2017
Field of study

We consider the Do-All problem, where

p

cooperating processors need to complete

t

similar and independent tasks in an adversarial setting. Here we deal with a synchronous message passing system with processors that are subject to crash failures. Efficiency of algorithms in this setting is measured in terms of work complexity (also known as total available processor steps) and communication complexity (total number of point-to-point messages). When work and communication are considered to be comparable resources, then the overall efficiency is meaningfully expressed in terms of effort defined as work + communication. We develop and analyze a constructive algorithm that has work