Search CORE

305 research outputs found

Efficient Logging in Non-Volatile Memory by Exploiting Coherency Protocols

Author: Cohen Nachshon
Friedman Michal
Larus James R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/09/2017
Field of study

Non-volatile memory (NVM) technologies such as PCM, ReRAM and STT-RAM allow processors to directly write values to persistent storage at speeds that are significantly faster than previous durable media such as hard drives or SSDs. Many applications of NVM are constructed on a logging subsystem, which enables operations to appear to execute atomically and facilitates recovery from failures. Writes to NVM, however, pass through a processor's memory system, which can delay and reorder them and can impair the correctness and cost of logging algorithms. Reordering arises because of out-of-order execution in a CPU and the inter-processor cache coherence protocol. By carefully considering the properties of these reorderings, this paper develops a logging protocol that requires only one round trip to non-volatile memory while avoiding expensive computations. We show how to extend the logging protocol to building a persistent set (hash map) that also requires only a single round trip to non-volatile memory for insertion, updating, or deletion

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Fine-Grain Checkpointing with In-Cache-Line Logging

Author: Aksun David T.
Avni Hillel
Cohen Nachshon
Larus James R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/02/2019
Field of study

Non-Volatile Memory offers the possibility of implementing high-performance, durable data structures. However, achieving performance comparable to well-designed data structures in non-persistent (transient) memory is difficult, primarily because of the cost of ensuring the order in which memory writes reach NVM. Often, this requires flushing data to NVM and waiting a full memory round-trip time. In this paper, we introduce two new techniques: Fine-Grained Checkpointing, which ensures a consistent, quickly recoverable data structure in NVM after a system failure, and In-Cache-Line Logging, an undo-logging technique that enables recovery of earlier state without requiring cache-line flushes in the normal case. We implemented these techniques in the Masstree data structure, making it persistent and demonstrating the ease of applying them to a highly optimized system and their low (5.9-15.4\%) runtime overhead cost.Comment: In 2019 Architectural Support for Programming Languages and Operating Systems (ASPLOS 19), April 13, 2019, Providence, RI, US

arXiv.org e-Print Archive

Crossref

Technical Perspective Programming Multicore Computers

Author: Larus James
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Taiwan\u27s Reaction to the Global Financial Crisis

Author: Larus Elizabeth Freund
Wu James
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/01/2010
Field of study

Yale University

Object-Oriented Recovery for Non-volatile Memory

Author: Aksun David Teksen
Cohen Nachshon
Larus James
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/09/2018
Field of study

New non-volatile memory (NVM) technologies enable direct, durable storage of data in an application's heap. Durable, randomly accessible memory facilitates the construction of applications that do not lose data at system shutdown or power failure. Existing NVM programming frameworks provide mechanisms to consistently capture a running application's state. They do not, however, fully support object-oriented languages or ensure that the persistent heap is consistent with the environment when the application is restarted. In this paper, we propose a new NVM language extension and runtime system that supports object-oriented NVM programming and avoids the pitfalls of prior approaches. At the heart of our technique is \emph{object reconstruction}, which transparently restores and reconstructs a persistent object's state during program restart. It is implemented in NVMReconstruction, a Clang/LLVM extension and runtime library that provides: (i) transient fields in persistent objects, (ii) support for virtual functions and function pointers, (iii) direct representation of persistent pointers as virtual addresses, and (iv) type-specific reconstruction of a persistent object during program restart. In addition, NVMReconstruction supports updating an application's code, even if this causes objects to expand, by providing object migration. NVMReconstruction also can compact the persistent heap to reduce fragmentation. In experiments, we demonstrate the versatility and usability of object reconstruction and its low runtime performance cost

Infoscience - École polytechnique fédérale de Lausanne

Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism

Author: Emami Mahyar
Kamahori Keisuke
Kashani Sahand
Larus James R.
Pourghannad Mohammad Sepehr
Raj Ritik
Publication venue
Publication date: 23/01/2023
Field of study

The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware heavily uses cycle-accurate simulation of register-transfer-level (RTL) designs. The best software RTL simulators can simulate designs at 1--1000~kHz, i.e., more than three orders of magnitude slower than hardware. Faster simulation can increase productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to use parallelism as RTL exposes considerable fine-grain concurrency. However, state-of-the-art RTL simulators generally perform best when single-threaded since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate runtime synchronization barriers among many simple processors. Manticore relies entirely on its compiler to schedule resources and communication. Because RTL code is practically free of long divergent execution paths, static scheduling is feasible. Communication and synchronization no longer incur runtime overhead, enabling efficient fine-grain parallelism. Moreover, static scheduling dramatically simplifies the physical implementation, significantly increasing the potential parallelism on a chip. Our 225-core FPGA prototype running at 475 MHz outperforms a state-of-the-art RTL simulator on an Intel Xeon processor running at

\approx

3.3 GHz by up to 27.9

\times

(geomean 5.3

\times

) in nine Verilog benchmarks

arXiv.org e-Print Archive

Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability

Author: Chen Haibo
Du Dong
Feng Jia
Larus James
Liu Qingyuan
Xia Yubin
Yang Yanning
Zhang Ping
Publication venue
Publication date: 01/03/2024
Field of study

Current serverless platforms struggle to optimize resource utilization due to their dynamic and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall short, often sacrificing utilization for practicability or incurring performance trade-offs. Overcommitment requires predicting performance to prevent QoS violation, introducing trade-off between prediction accuracy and overheads. Autoscaling requires scaling instances in response to load fluctuations quickly to reduce resource wastage, but more frequent scaling also leads to more cold start overheads. This paper introduces Jiagu, which harmonizes efficiency with practicability through two novel techniques. First, pre-decision scheduling achieves accurate prediction while eliminating overheads by decoupling prediction and scheduling. Second, dual-staged scaling achieves frequent adjustment of instances with minimum overhead. We have implemented a prototype and evaluated it using real-world applications and traces from the public cloud platform. Our evaluation shows a 54.8% improvement in deployment density over commercial clouds (with Kubernetes) while maintaining QoS, and 81.0%--93.7% lower scheduling costs and a 57.4%--69.3% reduction in cold start latency compared to existing QoS-aware schedulers in research work.Comment: 17 pages, 17 figure

arXiv.org e-Print Archive

Typing Copyless Message Passing

Author: Bruno Courcelle
Dario Colazzo and Giorgio Ghelli
Frank Piessens
Galen C. Hunt and James R. Larus
Luca Cardelli Simone Martini, John C. M
Luca Padovani
Simon Gay
Simon Gay and Malcolm Hole
Simon Gay and Vasco T. Vasconcelos
Viviana Bono
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2011
Field of study

We present a calculus that models a form of process interaction based on copyless message passing, in the style of Singularity OS. The calculus is equipped with a type system ensuring that well-typed processes are free from memory faults, memory leaks, and communication errors. The type system is essentially linear, but we show that linearity alone is inadequate, because it leaves room for scenarios where well-typed processes leak significant amounts of memory. We address these problems basing the type system upon an original variant of session types.Comment: 50 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Episciences.org

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Camerino

Institutional Research Information System University of Turin