Search CORE

169 research outputs found

Fault-Tolerant Distributed Services in Message-Passing Systems

Author: Kumar Saptaparni
Publication venue
Publication date: 20/11/2019
Field of study

Distributed systems ranging from small local area networks to large wide area networks like the Internet composed of static and/or mobile users have become increasingly popular. A desirable property for any distributed service is fault-tolerance, which means the service remains uninterrupted even if some components in the network fail. This dissertation considers weak distributed models to find either algorithms to solve certain problems or impossibility proofs to show that a problem is unsolvable. These are the main contributions of this dissertation: • Failure detectors are used as a service to solve consensus (agreement among nodes) which is otherwise impossible in failure-prone asynchronous systems. We find an algorithm for crash-failure detection that uses bounded size messages in an arbitrary, partitionable network composed of badly- behaved channels that can lose and reorder messages. • Registers are a fundamental building block for shared memory emulations on top of message passing systems. The problem has been extensively studied in static systems. However, register emulation in dynamic systems with faulty nodes is still quite hard and there are impossibility proofs that point out scenarios where change in the system composition due to nodes entering and leaving (also called churn) makes the problem unsolvable. We propose the first emulation of a crash-fault tolerant register in a system with continuous churn where consensus is unsolvable, the size of the system can grow without bound and at most a constant fraction of the number of nodes in the system can fail by crashing. We prove a lower bound that states that fault-tolerance for dynamic systems with churn is inherently lower than in static systems. • We then extend the results in the crash-fault tolerant case to a dynamic system with continuous churn and nodes that can be Byzantine faulty. It is the first emulation of an atomic register in a system that can withstand nodes continually entering and leaving, imposes no upper bound on the system size and can tolerate Byzantine nodes. However, the number of Byzantine faulty nodes that can be tolerated is upper bounded by a constant number. Although the algorithm requires that there be a constant known upper bound on the number of Byzantine nodes, this restriction is unavoidable, as we show that it is impossible to emulate an atomic register if the system size and maximum number of servers that can be Byzantine in the system is unknown

Distributed Algorithmic Foundations of Dynamic Networks

Author: Augustine John
Pandurangan Gopal
Robinson Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/03/2016
Field of study

Crossref

Royal Holloway - Pure

Automatic Reconfiguration for Large-Scale Reliable Storage Systems

Author: Barbara Liskov
Citable Link
David Schultz
Kathryn Chen
Moses Liskov
Rodrigo Rodrigues
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in how they handle reconfigurations (e.g., in terms of the scalability of the solutions or the consistency levels they provide). This can be problematic in long-lived, large-scale systems where system membership is likely to change during the system lifetime. In this paper, we present a complete solution for dynamically changing system membership in a large-scale Byzantine-fault-tolerant system. We present a service that tracks system membership and periodically notifies other system nodes of membership changes. The membership service runs mostly automatically, to avoid human configuration errors; is itself Byzantine-fault-tolerant and reconfigurable; and provides applications with a sequence of consistent views of the system membership. We demonstrate the utility of this membership service by using it in a novel distributed hash table called dBQS that provides atomic semantics even across changes in replica sets. dBQS is interesting in its own right because its storage algorithms extend existing Byzantine quorum protocols to handle changes in the replica set, and because it differs from previous DHTs by providing Byzantine fault tolerance and offering strong semantics. We implemented the membership service and dBQS. Our results show that the approach works well, in practice: the membership service is able to manage a large system and the cost to change the system membership is low

Asynchronous Reconfiguration with Byzantine Failures

Author: Kuznetsov Petr
Tonkikh Andrei
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th International Symposium on Distributed Computing (DISC 2020)
Publication date: 01/01/2020
Field of study

Replicated services are inherently vulnerable to failures and security breaches. In a long-running system, it is, therefore, indispensable to maintain a reconfiguration mechanism that would replace faulty replicas with correct ones. An important challenge is to enable reconfiguration without affecting the availability and consistency of the replicated data: the clients should be able to get correct service even when the set of service replicas is being updated. In this paper, we address the problem of reconfiguration in the presence of Byzantine failures: faulty replicas or clients may arbitrarily deviate from their expected behavior. We describe a generic technique for building asynchronous and Byzantine fault-tolerant reconfigurable objects: clients can manipulate the object data and issue reconfiguration calls without reaching consensus on the current configuration. With the help of forward-secure digital signatures, our solution makes sure that superseded and possibly compromised configurations are harmless, that slow clients cannot be fooled into reading stale data, and that Byzantine clients cannot cause a denial of service by flooding the system with reconfiguration requests. Our approach is modular and based on dynamic lattice agreement abstraction, and we discuss how to extend it to enable Byzantine fault-tolerant implementations of a large class of reconfigurable replicated services

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On the design of a moving target defense framework for the resiliency of critical services in large distributed networks

Author: Amarnath Athith
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2018
Field of study

2018 Fall.Includes bibliographical references.Security is a very serious concern in this era of digital world. Protecting and controlling access to secured data and services has given more emphasis to access control enforcement and management. Where, access control enforcement with strong policies ensures the data conﬁdentiality, availability and integrity, protecting the access control service itself is equally important. When these services are hosted on a single server for a lengthy period of time, the attackers have potentially unlimited time to periodically explore and enumerate the vulnerabilities with respect to the conﬁguration of the server and launch targeted attacks on the service. Constant proliferation of cloud usage and distributed systems over the last decade have materialized the possibilities of distributing data or hosting services over a group of servers located in different geographical locations. Existing election algorithms used to provide service continuity hosted in the distributed setup work well in a benign environment. However, these algorithms are not secure against skillful attackers who intends to manipulate or bring down the data or service. In this thesis, we design and implement the protection of critical services, such as access-control reference monitors, using the concept of moving target defense. This concept increases the level of difﬁculty faced by the attacker to compromise the point of service by periodically moving the critical service among a group of heterogeneous servers, thereby changing the attacker surface and increasing uncertainty and randomness in the point of service chosen. We describe an efﬁcient Byzantine fault-tolerant leader election protocol for small networks that achieves the security and performance goals described in the problem statement. We then extend this solution to large enterprise networks by introducing random walk protocol that randomly chooses a subset of servers taking part in the election protocol

Mountain Scholar (Digital Collections of Colorado and Wyoming)

PeerCube: an Hypercube-based P2P Overlay Robust against Collusion and Churn

Author: Anceaume Emmanuelle
Brasiliero Francisco
Ludinard Romaric
Ravoaja Aina
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

International audienceIn this paper we present PeerCube, a DHT-based system that aims at minimizing performance penalties caused by high churn while preventing malicious peers from subverting the system through collusion. This is achieved by i) applying a clustering strategy to support quorum-based operations; ii) using a randomised insertion algorithm to reduce the probability with which colluding Byzantine peers corrupt clusters, and; iii) leveraging on the properties of PeerCube's hypercube structure to allow operations to be successfully handled despite the corruption of some clusters. Despite a powerful adversary that can inspect the whole system and issue malicious join requests as often as it wishes, PeerCube guarantees robust operations in O(logN) messages, with N the number of peers in the system. Extended simulations validate PeerCube robustness

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Fault-Tolerant Distributed Services in Message-Passing Systems

Author: Kumar Saptaparni
Publication venue
Publication date: 20/11/2019
Field of study

Texas A&M Repository

Atum: Scalable Group Communication Using Volatile Groups

Author: Awerbuch B.
Cowling J.
Cully B.
Dolev D.
Fetzer C.
Geng H.
Hiltunen M. A.
Lamport L.
Lesniewski-Lass C.
Li H. C.
Ongaro D.
Oppenheimer D.
Rhea S.
Rhea S. C.
Roverso R.
Singla A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/12/2016
Field of study

This paper presents Atum, a group communication middleware for a large, dynamic, and hostile environment. At the heart of Atum lies the novel concept of volatile groups: small, dynamic groups of nodes, each executing a state machine replication protocol, organized in a flexible overlay. Using volatile groups, Atum scatters faulty nodes evenly among groups, and then masks each individual fault inside its group. To broadcast messages among volatile groups, Atum runs a gossip protocol across the overlay. We report on our synchronous and asynchronous (eventually synchronous) implementations of Atum, as well as on three representative applications that we build on top of it: A publish/subscribe platform, a file sharing service, and a data streaming system. We show that (a) Atum can grow at an exponential rate beyond 1000 nodes and disseminate messages in polylogarithmic time (conveying good scalability); (b) it smoothly copes with 18% of nodes churning every minute; and (c) it is impervious to arbitrary faults, suffering no performance decay despite 5.8% Byzantine nodes in a system of 850 nodes

Infoscience - École polytechnique fédérale de Lausanne

Crossref