Search CORE

2,641 research outputs found

A Peer-to-Peer Middleware Framework for Resilient Persistent Programming

Author: Dearle Alan
Kirby Graham
McCarthy Andrew
Norcross Stuart
Publication venue
Publication date: 18/06/2010
Field of study

The persistent programming systems of the 1980s offered a programming model that integrated computation and long-term storage. In these systems, reliable applications could be engineered without requiring the programmer to write translation code to manage the transfer of data to and from non-volatile storage. More importantly, it simplified the programmer's conceptual model of an application, and avoided the many coherency problems that result from multiple cached copies of the same information. Although technically innovative, persistent languages were not widely adopted, perhaps due in part to their closed-world model. Each persistent store was located on a single host, and there were no flexible mechanisms for communication or transfer of data between separate stores. Here we re-open the work on persistence and combine it with modern peer-to-peer techniques in order to provide support for orthogonal persistence in resilient and potentially long-running distributed applications. Our vision is of an infrastructure within which an application can be developed and distributed with minimal modification, whereupon the application becomes resilient to certain failure modes. If a node, or the connection to it, fails during execution of the application, the objects are re-instantiated from distributed replicas, without their reference holders being aware of the failure. Furthermore, we believe that this can be achieved within a spectrum of application programmer intervention, ranging from minimal to totally prescriptive, as desired. The same mechanisms encompass an orthogonally persistent programming model. We outline our approach to implementing this vision, and describe current progress.Comment: Submitted to EuroSys 200

arXiv.org e-Print Archive

St Andrews Research Repository

Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC

Author: Li Dong
Qiao Yifan
Wu Kai
Yang Shuo
Zhai Jidong
Publication venue
Publication date: 16/05/2017
Field of study

Fault tolerance is one of the major design goals for HPC. The emergence of non-volatile memories (NVM) provides a solution to build fault tolerant HPC. Data in NVM-based main memory are not lost when the system crashes because of the non-volatility nature of NVM. However, because of volatile caches, data must be logged and explicitly flushed from caches into NVM to ensure consistence and correctness before crashes, which can cause large runtime overhead. In this paper, we introduce an algorithm-based method to establish crash consistence in NVM for HPC applications. We slightly extend application data structures or sparsely flush cache blocks, which introduce ignorable runtime overhead. Such extension or cache flushing allows us to use algorithm knowledge to \textit{reason} data consistence or correct inconsistent data when the application crashes. We demonstrate the effectiveness of our method for three algorithms, including an iterative solver, dense matrix multiplication, and Monte-Carlo simulation. Based on comprehensive performance evaluation on a variety of test environments, we demonstrate that our approach has very small runtime overhead (at most 8.2\% and less than 3\% in most cases), much smaller than that of traditional checkpoint, while having the same or less recomputation cost.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Brief Announcement: A Persistent Lock-Free Queue for Non-Volatile Memory

Author: Friedman Michal
Herlihy Maurice
Marathe Virendra
Petrank Erez
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Distributed Computing (DISC 2017)
Publication date: 01/01/2017
Field of study

Non-volatile memory is expected to coexist with (or even displace) volatile DRAM for main memory in upcoming architectures. As a result, there is increasing interest in the problem of designing and specifying durable data structures that can recover from system crashes. Data-structures may be designed to satisfy stricter or weaker durability guarantees to provide a balance between the strength of the provided guarantees and performance overhead. This paper proposes three novel implementations of a concurrent lock-free queue. These implementations illustrate the algorithmic challenges in building persistent lock-free data structures with different levels of durability guarantees. We believe that by presenting these challenges, along with the proposed algorithmic designs, and the possible levels of durability guarantees, we can shed light on avenues for building a wide variety of durable data structures. We implemented the various designs and evaluate their performance overhead compared to a simple queue design for standard (volatile) memory

Dagstuhl Research Online Publication Server

The Case for Non-Volatile RAM in Cloud HPCaaS

Author: Fridman Yehonatan
Harel Re'em
Oren Gal
Publication venue
Publication date: 03/08/2022
Field of study

HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud services. However, continued effort to port large-scale tightly coupled applications with high interprocessor communication to multiple (and many) nodes synchronously, as in on-premise supercomputers, is still far from satisfactory due to network latencies. As a consequence, in said cases, HPCaaS is recommended to be used with one or few instances. In this paper we take the claim that new piece of memory hardware, namely Non-Volatile RAM (NVRAM), can allow such computations to scale up to an order of magnitude with marginalized penalty in comparison to RAM. Moreover, we suggest that the introduction of NVRAM to HPCaaS can be cost-effective to the users and the suppliers in numerous forms.Comment: 4 page

arXiv.org e-Print Archive

Recommended from our members

MAGNETO-ELECTRIC APPROXIMATE COMPUTATIONAL FRAMEWORK FOR BAYESIAN INFERENCE

Author: Kulkarni Sourabh
Publication venue: ScholarWorks@UMass Amherst
Publication date: 27/10/2017
Field of study

Probabilistic graphical models like Bayesian Networks (BNs) are powerful artificial-intelligence formalisms, with similarities to cognition and higher order reasoning in the human brain. These models have been, to great success, applied to several challenging real-world applications. Use of these formalisms to a greater set of applications is impeded by the limitations of the currently used software-based implementations. New emerging-technology based circuit paradigms which leverage physical equivalence, i.e., operating directly on probabilities vs. introducing layers of abstraction, promise orders of magnitude increase in performance and efficiency of BN implementations, enabling networks with millions of random variables. While majority of applications with small network size (100s of nodes) require only single digit precision for accurate results, applications with larger size (1000s to millions of nodes) require higher precision computation. We introduce a new BN integrated circuit fabric based on mixed-signal magneto-electric circuits which perform probabilistic computations based on the principle of approximate computation. Precision scaling in this fabric is logarithmic in area vs. linear in prior directions. Results show 33x area benefit for a 0.001 precision compared to prior direction, while maintaining three orders of magnitude performance benefits vs. 100-core processor implementations

ScholarWorks@UMass Amherst