22 research outputs found

    Memory Mangement in the PoSSo Solver

    Get PDF
    AbstractA uniform general purpose garbage collector may not always provide optimal performance. Sometimes an algorithm exhibits a predictable pattern of memory usage that could be exploited, delaying as much as possible the intervention of the collector. This requires a collector whose strategy can be customized to the need of an algorithm. We present a dynamic memory management framework which allows such customization, while preserving the convenience of automatic collection in the normal case. The Customizable Memory Management (CMM) organizes memory in multiple heaps, each one encapsulating a particular storage discipline. The default heap for collectable objects uses the technique of mostly copying garbage collection, providing good performance and memory compaction. Customization of the collector is achieved through object orientation by specialising the collector methods for each heap class. We describe how the CMM has been exploited in the implementation of the Buchberger algorithm, by using a special heap for temporary objects created during polynomial reduction. The solution drastically reduces the overall cost of memory allocation in the algorithm

    Distilling the Real Cost of Production Garbage Collectors

    Get PDF
    Abridged abstract: despite the long history of garbage collection (GC) and its prevalence in modern programming languages, there is surprisingly little clarity about its true cost. Without understanding their cost, crucial tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to misguided design constraints and evaluation criteria used by GC researchers and users, hindering the development of high-performance, low-cost GCs. In this paper, we develop a methodology that allows us to empirically estimate the cost of GC for any given set of metrics. By distilling out the explicitly identifiable GC cost, we estimate the intrinsic application execution cost using different GCs. The minimum distilled cost forms a baseline. Subtracting this baseline from the total execution costs, we can then place an empirical lower bound on the absolute costs of different GCs. Using this methodology, we study five production GCs in OpenJDK 17, a high-performance Java runtime. We measure the cost of these collectors, and expose their respective key performance tradeoffs. We find that with a modestly sized heap, production GCs incur substantial overheads across a diverse suite of modern benchmarks, spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative to the baseline cost. We show that these costs can be masked by concurrency and generous provisioning of memory/compute. In addition, we find that newer low-pause GCs are significantly more expensive than older GCs, and, surprisingly, sometimes deliver worse application latency than stop-the-world GCs. Our findings reaffirm that GC is by no means a solved problem and that a low-cost, low-latency GC remains elusive. We recommend adopting the distillation methodology together with a wider range of cost metrics for future GC evaluations.Comment: Camera-ready versio

    Local Variable Allocation For Accurate Garbage Collection of C++

    Get PDF
    Accurate garbage collection of C++ requires that every memory location and every register be known to contain either a pointer or a non-pointer. In order to minimize the run-time overhead of tagging memory locations and registers, techniques for partitioning memory and registers into separate classes dedicated independently to the representation of pointers and non-pointers respectively have been developed. This paper describes the implementation and performance of a specially designed activation frame targeted to the SPARC architecture, as implemented in a customized version of the GNU g++ compiler

    Register Allocation for Accurate Garbage Collection of C++

    Get PDF
    Register Allocation for Accurate Garbage Collection of C++ S. Satishkumar* M.S. Creative Component Accurate garbage collection of C++ requires that every memory location and every register be known to contain either a pointer or a non-pointer. In order to minimize the run-time overhead of tagging memory locations and registers, techniques for partitioning memory and registers into separate classes dedicated independently to the representation of pointers and non-pointers respectively have been developed. This paper describes the implementation and performance of a specially designed register allocator for the GNU g++ compiler. * Portions of this paper were excerpted from Code Generation to Support Efficient Accurate Garbage Collection of C++ on Stock Hardware , a paper currently being prepared for publication by Kelvin Nilsen, Ravichandran Ganesan, Satish Guggilla, Satish Kumar, and Kannan Narasimha

    Simulation of High-Performance Memory Allocators

    Get PDF
    This study presents a single-core and a multi-core processor architecture for health monitoring systems where slow biosignal events and highly parallel computations exist. The single-core architecture is composed of a processing core (PC), an instruction memory (IM) and a data memory (DM), while the multi-core architecture consists of PCs, individual IMs for each core, a shared DM and an interconnection crossbar between the cores and the DM. These architectures are compared with respect to power vs. performance trade-offs for a multi-lead electrocardiogram signal conditioning application exploiting near threshold computing. The results show that the multi-core solution consumes 66%less power for high computation requirements (50.1 MOps/s), whereas 10.4% more power for low computation needs (681 kOps/s)
    corecore