Search CORE

33 research outputs found

The Raincore Distributed Session Service for Networking Elements

Author: Bruck Jehoshua
Fan Chenggong Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2001
Field of study

Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer protocols for networking elements. These protocols are designed to enable a network cluster to share the state information necessary for balancing network traffic and computation load among a group of networking elements. In addition, in the presence of failures, they allow network traffic to fail-over from failed networking elements to healthy ones. To maximize the overall network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity’s first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals

Caltech Authors

Adaptive and Scalable High Availability for Infrastructure Clouds

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

On the Cost of Modularity in Atomic Broadcast

Author: Ekwall Richard
Mena Sergio
Rütti Olivier
Schiper André
Publication venue
Publication date: 01/05/2007
Field of study

Modularity is a desirable property of complex software systems, since it simplifies code reuse, verification, maintenance, etc. However, the use of loosely coupled modules introduces a performance overhead. This overhead is often considered negligible, but this is not always the case. This paper aims at casting some light on the cost, in terms of performance, that is incurred when designing a relevant group communication protocol with modularity in mind: atomic broadcast. We conduct our experiments using two versions of atomic broadcast: a modular version and a monolithic one. We then measure the performance of both implementations under different system loads. Our results show that the overhead introduced by modularity is strongly related to the level of stress to which the system is subjected, and in the worst cases, reaches approximately 50%

Infoscience - École polytechnique fédérale de Lausanne

Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks

Author: Huang Qingfeng
Julien Christine
Roman Gruia-Catalin
Publication venue: Washington University Open Scholarship
Publication date: 20/09/2002
Field of study

The design of ad hoc mobile applications often requires the availability of a consistent view of the application state among the participating hosts. Such views are important because they simplify both the programming and veriﬁcation tasks. We argue that preventing the occurrence of unannounced disconnection is essential to constructing and maintaining a consistent view in the ad hoc mobile environment. In this light, we provide the speciﬁcation for a partitionable group membership service supporting ad hoc mobile applications and propose a protocol for implementing the service. A unique property of this partitionable group membership is that messages sent between group members are guaranteed to be delivered successfully, given appropriate system assumptions. This property is preserved over time despite movement and frequent disconnections. The protocol splits and merges groups and maintains a logical connectivity graph based on a notion of safe-distance. An implementation of the protocol in Java is available for testing. The implementation is used to implement Lime 1, a middleware for mobility that supports transparent sharing of data in both wired and ad hoc wireless environments

Washington University St. Louis: Open Scholarship

End-To-End Latency of a Fault-Tolerant CORBA Infrastructure

Author: Melliar-Smith P. Michale
Moser Louise E.
Zhao Wenbing
Publication venue: EngagedScholarship@CSU
Publication date: 01/05/2006
Field of study

This paper presents an evaluation of the end-to-end latency of a fault-tolerant CORBA infrastructure that we have implemented. The fault-tolerant infrastructure replicates the server applications using active, passive and semi-active replication, and maintains strong replica consistency of the server replicas. By analyses and by measurements of the running fault-tolerant infrastructure, we characterize the end-to-end latency under fault-free conditions. The main determining factor of the run-time performance of the fault-tolerant infrastructure is the Totem group communication protocol, which contributes to the end-to-end latency primarily in two ways: the delay in sending messages and the processing cost of the rotating token. To reduce the delay in sending messages for passive and semi-active replication, the position of the primary server replica on the Totem ring, the token rotation time, the processing time at the client, and the processing time at the server must be considered. For active replication, the presence of duplicate messages adversely affects the performance. However, if an effective sending-side duplicate suppression mechanism is implemented, active replication is more advantageous than both passive and semi-active replication because of the automatic selection of the most favorable position of the server replica that sends the first non-duplicate reply

Cleveland-Marshall College of Law

A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)

Author: Poke Marius
Glass Colin W.
Publication venue
Publication date: 05/02/2019
Field of study

Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, these systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail linear work per broadcast message. AllConcur -- a leaderless approach -- reduces the work, by connecting the servers via a sparse resilient overlay network; yet, this resiliency entails redundancy, limiting the reduction of work. In this paper, we propose AllConcur+, an atomic broadcast algorithm that lifts this limitation: During intervals with no failures, it achieves minimal work by using a redundancy-free overlay network. When failures do occur, it automatically recovers by switching to a resilient overlay network. In our performance evaluation of non-failure scenarios, AllConcur+ achieves comparable throughput to AllGather -- a non-fault-tolerant distributed agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in terms of throughput and latency. Furthermore, our evaluation of failure scenarios shows that AllConcur+'s expected performance is robust with regard to occasional failures. Thus, for realistic use cases, leveraging redundancy-free distributed agreement during intervals with no failures improves performance significantly.Comment: Overview: 24 pages, 6 sections, 3 appendices, 8 figures, 3 tables. Modifications from previous version: extended the evaluation of AllConcur+ with a simulation of a multiple datacenters deploymen

arXiv.org e-Print Archive

FigShare