Search CORE

21 research outputs found

Write-rationing garbage collection for hybrid memories

Author: Akram Shoaib
Blackburn Stephen M.
Blackburn Stephen M.
Bois Kristof Du
Burr Geoffrey W.
Frampton Daniel
Ha Jungwoo
Hicks Michael
Huang Xianglong
Jantz Michael R.
Jones Richard
Kyrola Aapo
Lee Benjamin C.
Li Sheng
Lim Kevin
Nguyen Khanh
Paleczny Michael
Qureshi Moinuddin K.
Qureshi Moinuddin K.
Sartor Jennifer B.
Shahriyar Rifat
Stephen
Wang Chenxi
Yang Xi
Zhang Lunkai
Zhao Yi
Zhou Yuanyuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Emerging Non-Volatile Memory (NVM) technologies offer high capacity and energy efficiency compared to DRAM, but suffer from limited write endurance and longer latencies. Prior work seeks the best of both technologies by combining DRAM and NVM in hybrid memories to attain low latency, high capacity, energy efficiency, and durability. Coarse-grained hardware and OS optimizations then spread writes out (wear-leveling) and place highly mutated pages in DRAM to extend NVM lifetimes. Unfortunately even with these coarse-grained methods, popular Java applications exact impractical NVM lifetimes of 4 years or less. This paper shows how to make hybrid memories practical, without changing the programming model, by enhancing garbage collection in managed language runtimes. We find object write behaviors offer two opportunities: (1) 70% of writes occur to newly allocated objects, and (2) 2% of objects capture 81% of writes to mature objects. We introduce writerationing garbage collectors that exploit these fine-grained behaviors. They extend NVM lifetimes by placing highly mutated objects in DRAM and read-mostly objects in NVM. We implement two such systems. (1) Kingsguard-nursery places new allocation in DRAM and survivors in NVM, reducing NVM writes by 5x versus NVM only with wear-leveling. (2) Kingsguard-writers (KG-W) places nursery objects in DRAM and survivors in a DRAM observer space. It monitors all mature object writes and moves unwritten mature objects from DRAM to NVM. Because most mature objects are unwritten, KG-W exploits NVM capacity while increasing NVM lifetimes by 11x. It reduces the energy-delay product by 32% over DRAM-only and 29% over NVM-only. This work opens up new avenues for making hybrid memories practical

Crossref

Ghent University Academic Bibliography

Detailed Heap Profiling

Author: Leverich Jacob
LLVM.
Serebryany Konstantin
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 04/06/2018
Field of study

Modern software systems heavily use the memory heap. As systems grow more complex and compute with increasing amounts of data, it can be difficult for developers to understand how their programs actually use the bytes that they allocate on the heap and whether improvements are possible. To answer this question of heap usage efficiency, we have built a new, detailed heap profiler called Memoro. Memoro uses a combination of static instrumentation, subroutine interception, and runtime data collection to build a clear picture of exactly when and where a program performs heap allocation, and crucially how it actually uses that memory. Memoro also introduces a new visualization application that can distill collected data into scores and visual cues that allow developers to quickly pinpoint and eliminate inefficient heap usage in their software. Our evaluation and experience with several applications demonstrates that Memoro can reduce heap usage and produce runtime improvements of 10%

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Subheap-Augmented Garbage Collection

Author: Karel Benjamin
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

Automated memory management avoids the tedium and danger of manual techniques. However, as no programmer input is required, no widely available interface exists to permit principled control over sometimes unacceptable performance costs. This dissertation explores the idea that performance-oriented languages should give programmers greater control over where and when the garbage collector (GC) expends effort. We describe an interface and implementation to expose heap partitioning and collection decisions without compromising type safety. We show that our interface allows the programmer to encode a form of reference counting using Hayes\u27 notion of key objects. Preliminary experimental data suggests that our proposed mechanism can avoid high overheads suffered by tracing collectors in some scenarios, especially with tight heaps. However, for other applications, the costs of applying subheaps---in human effort and runtime overheads---remain daunting

ScholarlyCommons@Penn

Learned Garbage Collection

Author: Alam M.
Dinella E.
Ellis K.
Jacek N.
Jangda A.
Marcus R.
Marcus R.
Ng A. Y.
Nunez D.
Ortiz J.
Publication venue: ScholarlyCommons
Publication date: 01/01/2020
Field of study

Several programming languages use garbage collectors (GCs) to automatically manage memory for the programmer. Such collectors must decide when to look for unreachable objects to free, which can have a large performance impact on some applications. In this preliminary work, we propose a design for a learned garbage collector that autonomously learns over time when to perform collections. By using reinforcement learning, our design can incorporate user-defined reward functions, allowing an autonomous garbage collector to learn to optimize the exact metric the user desires (e.g., request latency or queries per second). We conduct an initial experimental study on a prototype, demonstrating that an approach based on tabular Q learning may be promising

arXiv.org e-Print Archive

Crossref

DSpace@MIT

ScholarlyCommons@Penn

Scaling Up Performance of Managed Applications on NUMA Systems

Author: Andronikakis Andreas
Foutris Nikolaos
Kotselidis Christos-Efthymios
Papadakis Orion
Papadimitriou Michail
Stratikopoulos Athanasios
Xekalakis Polychronis
Zakkak Foivos
Publication venue
Publication date: 24/04/2023
Field of study

The University of Manchester - Institutional Repository

Retrofitting parallelism onto OCaml.

Author: Dhiman Atul
Dolan Stephen
Jaffer Sadiq
Kelly Tom
Madhavapeddy Anil
Parimala Sudha
Sahoo Anmol
Sivaramakrishnan KC
White Leo
Publication venue: Proceedings of the ACM on Programming Languages
Publication date: 02/07/2020
Field of study

OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml is also one of the few modern managed system programming languages to lack support for shared memory parallel programming. This paper describes the design, a full-fledged implementation and evaluation of a mostly-concurrent garbage collector (GC) for the multicore extension of the OCaml programming language. Given that we propose to add parallelism to a widely used programming language with millions of lines of existing code, we face the challenge of maintaining backwards compatibility--not just in terms of the language features but also the performance of single-threaded code running with the new GC. To this end, the paper presents a series of novel techniques and demonstrates that the new GC strikes a balance between performance and feature backwards compatibility for sequential programs and scales admirably on modern multicore processors

arXiv.org e-Print Archive

Apollo (Cambridge)

Cautiously Optimistic Program Analyses for Secure and Reliable Software

Author: Banerjee Subarno
Publication venue
Publication date: 01/01/2021
Field of study

Modern computer systems still have various security and reliability vulnerabilities. Well-known dynamic analyses solutions can mitigate them using runtime monitors that serve as lifeguards. But the additional work in enforcing these security and safety properties incurs exorbitant performance costs, and such tools are rarely used in practice. Our work addresses this problem by constructing a novel technique- Cautiously Optimistic Program Analysis (COPA). COPA is optimistic- it infers likely program invariants from dynamic observations, and assumes them in its static reasoning to precisely identify and elide wasteful runtime monitors. The resulting system is fast, but also ensures soundness by recovering to a conservatively optimized analysis when a likely invariant rarely fails at runtime. COPA is also cautious- by carefully restricting optimizations to only safe elisions, the recovery is greatly simplified. It avoids unbounded rollbacks upon recovery, thereby enabling analysis for live production software. We demonstrate the effectiveness of Cautiously Optimistic Program Analyses in three areas: Information-Flow Tracking (IFT) can help prevent security breaches and information leaks. But they are rarely used in practice due to their high performance overhead (>500% for web/email servers). COPA dramatically reduces this cost by eliding wasteful IFT monitors to make it practical (9% overhead, 4x speedup). Automatic Garbage Collection (GC) in managed languages (e.g. Java) simplifies programming tasks while ensuring memory safety. However, there is no correct GC for weakly-typed languages (e.g. C/C++), and manual memory management is prone to errors that have been exploited in high profile attacks. We develop the first sound GC for C/C++, and use COPA to optimize its performance (16% overhead). Sequential Consistency (SC) provides intuitive semantics to concurrent programs that simplifies reasoning for their correctness. However, ensuring SC behavior on commodity hardware remains expensive. We use COPA to ensure SC for Java at the language-level efficiently, and significantly reduce its cost (from 24% down to 5% on x86). COPA provides a way to realize strong software security, reliability and semantic guarantees at practical costs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170027/1/subarno_1.pd

Deep Blue Documents at the University of Michigan

NUMASK: High Performance Scalable Skip List for NUMA

Author: Daly Henry
Hassan Ahmed
Palmieri Roberto
Spear Michael F.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd International Symposium on Distributed Computing (DISC 2018)
Publication date: 01/01/2018
Field of study

This paper presents NUMASK, a skip list data structure specifically designed to exploit the characteristics of Non-Uniform Memory Access (NUMA) architectures to improve performance. NUMASK deploys an architecture around a concurrent skip list so that all metadata accesses (e.g., traversals of the skip list index levels) read and write memory blocks allocated in the NUMA zone where the thread is executing. To the best of our knowledge, NUMASK is the first NUMA-aware skip list design that goes beyond merely limiting the performance penalties introduced by NUMA, and leverages the NUMA architecture to outperform state-of-the-art concurrent high-performance implementations. We tested NUMASK on a four-socket server. Its performance scales for both read-intensive and write-intensive workloads (tested up to 160 threads). In write-intensive workload, NUMASK shows speedups over competitors in the range of 2x to 16x

Dagstuhl Research Online Publication Server

Functional Programming for Embedded Systems

Author: Sarkar Abhiroop
Publication venue
Publication date: 01/01/2022
Field of study

Embedded Systems application development has traditionally been carried out in low-level machine-oriented programming languages like C or Assembler that can result in unsafe, error-prone and difficult-to-maintain code. Functional programming with features such as higher-order functions, algebraic data types, polymorphism, strong static typing and automatic memory management appears to be an ideal candidate to address the issues with low-level languages plaguing embedded systems. However, embedded systems usually run on heavily memory-constrained devices with memory in the order of hundreds of kilobytes and applications running on such devices embody the general characteristics of being (i) I/O- bound, (ii) concurrent and (iii) timing-aware. Popular functional language compilers and runtimes either do not fare well with such scarce memory resources or do not provide high-level abstractions that address all the three listed characteristics. This work attempts to address this gap by investigating and proposing high-level abstractions specialised for I/O-bound, concurrent and timing-aware embedded-systems programs. We implement the proposed abstractions on eagerly-evaluated, statically-typed functional languages running natively on microcontrollers. Our contributions are divided into two parts - Part 1 presents a functional reactive programming language - Hailstorm - that tracks side effects like I/O in its type system using a feature called resource types. Hailstorm’s programming model is illustrated on the GRiSP microcontroller board.Part 2 comprises two papers that describe the design and implementation of Synchron, a runtime API that provides a uniform message-passing framework for the handling of software messages as well as hardware interrupts. Additionally, the Synchron API supports a novel timing operator to capture the notion of time, common in embedded applications. The Synchron API is implemented as a virtual machine - SynchronVM - that is run on the NRF52 and STM32 microcontroller boards. We present programming examples that illustrate the concurrency, I/O and timing capabilities of the VM and provide various benchmarks on the response time, memory and power usage of SynchronVM

Chalmers Research

Fast Nonblocking Persistence for Concurrent Data Structures

Author: Abdallah Shreif
Cai Wentao
Du Mingzhe
Maksimovski Vladimir
Sanna Rafaello
Scott Michael L.
Wen Haosen
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th International Symposium on Distributed Computing (DISC 2021)
Publication date: 01/01/2021
Field of study

We present a fully lock-free variant of our recent Montage system for persistent data structures. The variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a sync operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free sync significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature - dramatically faster than achieved with prior general-purpose systems, and generally within 50% of equivalent non-persistent structures placed in DRAM

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server