10,652 research outputs found

    The Parallel Persistent Memory Model

    Full text link
    We consider a parallel computational model that consists of PP processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of O(W/PA+D(P/PA)log1/fW)O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil) in expectation, where WW and DD are the work and depth of the computation (in the absence of failures), PAP_A is the average number of processors available during the computation, and f1/2f \le 1/2 is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

    Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures

    Get PDF
    Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current implementations assume a single address space shared memory. We investigate and extend SVP to handle distributed environments, and discuss a prototype SVP implementation which transparently supports execution on heterogeneous distributed memory clusters over TCP/IP connections, while retaining the original SVP programming model

    A Flexible Framework For Implementing Multi-Nested Software Transaction Memory

    Get PDF
    Programming with locks is very difficult in multi-threaded programmes. Concurrency control of access to shared data limits scalable locking strategies otherwise provided for in software transaction memory. This work addresses the subject of creating dependable software in the face of eminent failures. In the past, programmers who used lock-based synchronization to implement concurrent access to shared data had to grapple with problems with conventional locking techniques such as deadlocks, convoying, and priority inversion. This paper proposes another advanced feature for Dynamic Software Transactional Memory intended to extend the concepts of transaction processing to provide a nesting mechanism and efficient lock-free synchronization, recoverability and restorability. In addition, the code for implementation has also been researched, coded, tested, and implemented to achieve the desired objectives

    Coded Caching for Delay-Sensitive Content

    Full text link
    Coded caching is a recently proposed technique that achieves significant performance gains for cache networks compared to uncoded caching schemes. However, this substantial coding gain is attained at the cost of large delivery delay, which is not tolerable in delay-sensitive applications such as video streaming. In this paper, we identify and investigate the tradeoff between the performance gain of coded caching and the delivery delay. We propose a computationally efficient caching algorithm that provides the gains of coding and respects delay constraints. The proposed algorithm achieves the optimum performance for large delay, but still offers major gains for small delay. These gains are demonstrated in a practical setting with a video-streaming prototype.Comment: 9 page
    corecore