Search CORE

6,296 research outputs found

Recoverable and Detectable Fetch&Add

Author: Attiya Hagit
Ben-Baruch Ohad
Hendler Danny
Nahum Liad
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th International Conference on Principles of Distributed Systems (OPODIS 2021)
Publication date: 01/01/2022
Field of study

The emergence of systems with non-volatile main memory (NVRAM) increases the need for persistent concurrent objects. Of specific interest are recoverable implementations that, in addition to being robust to crash-failures, are also detectable. Detectability ensures that upon recovery, it is possible to infer whether the failed operation took effect or not and, in the former case, obtain its response. This work presents two recoverable detectable Fetch&Add (FAA) algorithms that are self-implementations, i.e, use only a fetch&add base object, in addition to read/write registers. The algorithms target two different models for recovery: the global-crash model and the individual-crash model. In both algorithms, operations are wait-free when there are no crashes, but the recovery code may block if there are repeated failures. We also prove that in the individual-crash model, there is no implementation of recoverable and detectable FAA using only read, write and fetch&add primitives in which all operations, including recovery, are lock-free

Dagstuhl Research Online Publication Server

Robust Shared Objects for Non-Volatile Main Memory

Author: Berryhill Ryan
Golab Wojciech
Tripunitara Mahesh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Conference on Principles of Distributed Systems (OPODIS 2015)
Publication date: 01/01/2016
Field of study

Research in concurrent in-memory data structures has focused almost exclusively on models where processes are either reliable, or may fail by crashing permanently. The case where processes may recover from failures has received little attention because recovery from conventional volatile memory is impossible in the event of a system crash, during which both the state of main memory and the private states of processes are lost. Future hardware architectures are likely to include various forms of non-volatile random access memory (NVRAM), creating new opportunities to design robust main memory data structures that can recover from system crashes. In this paper we advance the theoretical foundations of such data structures in two ways. First, we review several known variations of Herlihy and Wing\u27s linearizability property that were proposed in the context of message passing systems but also apply in our NVRAM-based model, we discuss the limitations of these properties with respect to our specific goals, and we propose an alternative correctness condition called recoverable linearizability. Second, we discuss techniques for implementing shared objects that satisfy such properties with a focus on wait-free implementations. Specifically, we demonstrate how to achieve different variations of linearizability in our model by transforming two classic wait-free constructions

Dagstuhl Research Online Publication Server

Detectable Sequential Specifications for Recoverable Shared Objects

Author: Golab Wojciech
Li Nan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th International Symposium on Distributed Computing (DISC 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Recovering Shared Objects Without Stable Storage

Author: Michael Ellis
Ports Dan R. K.
Sharma Naveen Kr.
Szekeres Adriana
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Distributed Computing (DISC 2017)
Publication date: 01/01/2017
Field of study

This paper considers the problem of building fault-tolerant shared objects when processes can crash and recover but lose their persistent state on recovery. This Diskless Crash-Recovery (DCR) model matches the way many long-lived systems are built. We show that it presents new challenges, as operations that are recorded at a quorum may not persist after some of the processes in that quorum crash and then recover. To address this problem, we introduce the notion of crash-consistent quorums, where no recoveries happen during the quorum responses. We show that relying on crash-consistent quorums enables a recovery procedure that can recover all operations that successfully finished. Crash-consistent quorums can be easily identified using a mechanism we term the crash vector, which tracks the causal relationship between crashes, recoveries, and other operations. We apply crash-consistent quorums and crash vectors to build two storage primitives. We give a new algorithm for multi-writer, multi-reader atomic registers in the DCR model that guarantees safety under all conditions and termination under a natural condition. It improves on the best prior protocol for this problem by requiring fewer rounds, fewer nodes to participate in the quorum, and a less restrictive liveness condition. We also present a more efficient single-writer, single-reader atomic set - a virtual stable storage abstraction. It can be used to lift any existing algorithm from the traditional Crash-Recovery model to the DCR model. We examine a specific application, state machine replication, and show that existing diskless protocols can violate their correctness guarantees, while ours offers a general and correct solution

Dagstuhl Research Online Publication Server

Consensual Resilient Control: Stateless Recovery of Stateful Controllers

Author: Graczyk Rafal
Lucchetti Federico
Matovic Aleksandar
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th Euromicro Conference on Real-Time Systems (ECRTS 2023)
Publication date: 01/01/2023
Field of study

Safety-critical systems have to absorb accidental and malicious faults to obtain high mean-times-to-failures (MTTFs). Traditionally, this is achieved through re-execution or replication. However, both techniques come with significant overheads, in particular when cold-start effects are considered. Such effects occur after replicas resume from checkpoints or from their initial state. This work aims at improving on the performance of control-task replication by leveraging an inherent stability of many plants to tolerate occasional control-task deadline misses and suggests masking faults just with a detection quorum. To make this possible, we have to eliminate cold-start effects to allow replicas to rejuvenate during each control cycle. We do so, by systematically turning stateful controllers into instants that can be recovered in a stateless manner. We highlight the mechanisms behind this transformation, how it achieves consensual resilient control, and demonstrate on the example of an inverted pendulum how accidental and maliciously-induced faults can be absorbed, even if control tasks run in less predictable environments

Dagstuhl Research Online Publication Server