30 research outputs found

    Recoverable FCFS Mutual Exclusion with Wait-Free Recovery

    Get PDF
    Traditional mutual exclusion locks are not resilient to failures: if there is a power outage, the memory is wiped out. Thus, when the system comes back on, the lock will have to be restored to the initial state, i.e., all processes are rolled back to the Remainder section and all variables are reset to their initial values. Recently, Golab and Ramaraju showed that we can improve this state of the art by exploiting the Non-Volatile RAM (NVRAM). They designed algorithms that, by maintaining shared variables in NVRAM, allow processes to recover from crashes on their own without a need for a global reset, even though a crash can wipe out the local memory of a process. We present a Recoverable Mutual Exclusion algorithm using the commonly supported CAS primitive. The main features of our algorithm are that it satisfies FCFS, it ensures that each process recovers in a wait-free manner, and in the absence of failures, it guarantees a worst-case Remote Memory Reference (RMR) complexity of O(lg n) on both Cache Coherent (CC) and Distributed Shared Memory (DSM) machines, where n is the number of processes for which the algorithm is designed. This bound matches the Omega(lg n) RMR lower bound by Attiya, Hendler, and Woelfel for Mutual Exclusion algorithms that use comparison primitives

    Recoverable, Abortable, and Adaptive Mutual Exclusion with Sublogarithmic RMR Complexity

    Get PDF
    We present the first recoverable mutual exclusion (RME) algorithm that is simultaneously abortable, adaptive to point contention, and with sublogarithmic RMR complexity. Our algorithm has O(min(K,log_W N)) RMR passage complexity and O(F + min(K,log_W N)) RMR super-passage complexity, where K is the number of concurrent processes (point contention), W is the size (in bits) of registers, and F is the number of crashes in a super-passage. Under the standard assumption that W = ?(log N), these bounds translate to worst-case O((log N)/(log log N)) passage complexity and O(F + (log N)/(log log N)) super-passage complexity. Our key building blocks are: - A D-process abortable RME algorithm, for D ? W, with O(1) passage complexity and O(1+F) super-passage complexity. We obtain this algorithm by using the Fetch-And-Add (FAA) primitive, unlike prior work on RME that uses Fetch-And-Store (FAS/SWAP). - A generic transformation that transforms any abortable RME algorithm with passage complexity of B < W, into an abortable RME lock with passage complexity of O(min(K,B))

    RGLock: Recoverable Mutual Exclusion for Non-Volatile Main Memory Systems

    Get PDF
    Mutex locks have traditionally been the most popular concurrent programming mechanisms for inter-process synchronization in the rapidly advancing field of concurrent computing systems that support high-performance applications. However, the concept of recoverability of these algorithms in the event of a crash failure has not been studied thoroughly. Popular techniques like transaction roll-back are widely known for providing fault-tolerance in modern Database Management Systems. Whereas in the context of mutual exclusion in shared memory systems, none of the prominent lock algorithms (e.g., Lamport’s Bakery algorithm, MCS lock, etc.) are designed to tolerate crash failures, especially in operations carried out in the critical sections. Each of these algorithms may fail to maintain mutual exclusion, or sacrifice some of the liveness guarantees in presence of crash failures. Storing application data and recovery information in the primary storage with conventional volatile memory limits the development of efficient crash-recovery mechanisms since a failure on any component in the system causes a loss of program data. With the advent of Non-Volatile Main Memory technologies, opportunities have opened up to redefine the problem of Mutual Exclusion in the context of a crash-recovery model where processes may recover from crash failures and resume execution. When the main memory is non-volatile, an application’s entire state can be recovered from a crash using the in-memory state near-instantaneously, making a process’s failure appear as a suspend/resume event. This thesis proceeds to envision a solution for the problem of mutual exclusion in such systems. The goal is to provide a first-of-its-kind mutex lock that guarantees mutual exclusion and starvation freedom in emerging shared-memory architectures that incorporate non-volatile main memory (NVMM)

    Causal synchrony in the design of distributed programs

    Get PDF
    The outcome of any computation is determined by the order of the events in the computation and the state of the component variables of the computation at those events. The level of knowledge that can be obtained about event order and process state influences protocol design and operation. In a centralized system, the presence of a physical clock makes it easy to determine event order. It is a more difficult task in a distributed system because there is normally no global time. Hence, there is no common time reference to be used for ordering events. as a consequence, distributed protocols are often designed without explicit reference to event order. Instead they are based on some approximation of global state. Because global state is also difficult to identify in a distributed system, the resulting protocols are not as efficient or clear as they could be.;We subscribe to Lamport\u27s proposition that the relevant temporal ordering of any two events is determined by their causal relationship and that knowledge of the causal order can be a powerful tool in protocol design. Mattern\u27s vector time can be used to identify the causal order, thereby providing the common frame of reference needed to order events in a distributed computation. In this dissertation we present a consistent methodology for analysis and design of distributed protocols that is based on the causal order and vector time. Using it we can specify conditions which must be met for a protocol to be correct, we can define the axiomatic protocol specifications, and we can structure reasoning about the correctness of the specified protocol. Employing causality as a unifying concept clarifies protocol specifications and correctness arguments because it enables them to be defined purely in terms of local states and local events.;We have successfully applied this methodology to the problems of distributed termination detection, distributed deadlock detection and resolution, and optimistic recovery. In all cases, the causally synchronous protocols we have presented are efficient and demonstrably correct
    corecore