2,641 research outputs found

    A Peer-to-Peer Middleware Framework for Resilient Persistent Programming

    Get PDF
    The persistent programming systems of the 1980s offered a programming model that integrated computation and long-term storage. In these systems, reliable applications could be engineered without requiring the programmer to write translation code to manage the transfer of data to and from non-volatile storage. More importantly, it simplified the programmer's conceptual model of an application, and avoided the many coherency problems that result from multiple cached copies of the same information. Although technically innovative, persistent languages were not widely adopted, perhaps due in part to their closed-world model. Each persistent store was located on a single host, and there were no flexible mechanisms for communication or transfer of data between separate stores. Here we re-open the work on persistence and combine it with modern peer-to-peer techniques in order to provide support for orthogonal persistence in resilient and potentially long-running distributed applications. Our vision is of an infrastructure within which an application can be developed and distributed with minimal modification, whereupon the application becomes resilient to certain failure modes. If a node, or the connection to it, fails during execution of the application, the objects are re-instantiated from distributed replicas, without their reference holders being aware of the failure. Furthermore, we believe that this can be achieved within a spectrum of application programmer intervention, ranging from minimal to totally prescriptive, as desired. The same mechanisms encompass an orthogonally persistent programming model. We outline our approach to implementing this vision, and describe current progress.Comment: Submitted to EuroSys 200

    Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC

    Full text link
    Fault tolerance is one of the major design goals for HPC. The emergence of non-volatile memories (NVM) provides a solution to build fault tolerant HPC. Data in NVM-based main memory are not lost when the system crashes because of the non-volatility nature of NVM. However, because of volatile caches, data must be logged and explicitly flushed from caches into NVM to ensure consistence and correctness before crashes, which can cause large runtime overhead. In this paper, we introduce an algorithm-based method to establish crash consistence in NVM for HPC applications. We slightly extend application data structures or sparsely flush cache blocks, which introduce ignorable runtime overhead. Such extension or cache flushing allows us to use algorithm knowledge to \textit{reason} data consistence or correct inconsistent data when the application crashes. We demonstrate the effectiveness of our method for three algorithms, including an iterative solver, dense matrix multiplication, and Monte-Carlo simulation. Based on comprehensive performance evaluation on a variety of test environments, we demonstrate that our approach has very small runtime overhead (at most 8.2\% and less than 3\% in most cases), much smaller than that of traditional checkpoint, while having the same or less recomputation cost.Comment: 12 page

    Brief Announcement: A Persistent Lock-Free Queue for Non-Volatile Memory

    Get PDF
    Non-volatile memory is expected to coexist with (or even displace) volatile DRAM for main memory in upcoming architectures. As a result, there is increasing interest in the problem of designing and specifying durable data structures that can recover from system crashes. Data-structures may be designed to satisfy stricter or weaker durability guarantees to provide a balance between the strength of the provided guarantees and performance overhead. This paper proposes three novel implementations of a concurrent lock-free queue. These implementations illustrate the algorithmic challenges in building persistent lock-free data structures with different levels of durability guarantees. We believe that by presenting these challenges, along with the proposed algorithmic designs, and the possible levels of durability guarantees, we can shed light on avenues for building a wide variety of durable data structures. We implemented the various designs and evaluate their performance overhead compared to a simple queue design for standard (volatile) memory

    The Case for Non-Volatile RAM in Cloud HPCaaS

    Full text link
    HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud services. However, continued effort to port large-scale tightly coupled applications with high interprocessor communication to multiple (and many) nodes synchronously, as in on-premise supercomputers, is still far from satisfactory due to network latencies. As a consequence, in said cases, HPCaaS is recommended to be used with one or few instances. In this paper we take the claim that new piece of memory hardware, namely Non-Volatile RAM (NVRAM), can allow such computations to scale up to an order of magnitude with marginalized penalty in comparison to RAM. Moreover, we suggest that the introduction of NVRAM to HPCaaS can be cost-effective to the users and the suppliers in numerous forms.Comment: 4 page
    corecore