22 research outputs found
Memory Mangement in the PoSSo Solver
AbstractA uniform general purpose garbage collector may not always provide optimal performance. Sometimes an algorithm exhibits a predictable pattern of memory usage that could be exploited, delaying as much as possible the intervention of the collector. This requires a collector whose strategy can be customized to the need of an algorithm. We present a dynamic memory management framework which allows such customization, while preserving the convenience of automatic collection in the normal case. The Customizable Memory Management (CMM) organizes memory in multiple heaps, each one encapsulating a particular storage discipline. The default heap for collectable objects uses the technique of mostly copying garbage collection, providing good performance and memory compaction. Customization of the collector is achieved through object orientation by specialising the collector methods for each heap class. We describe how the CMM has been exploited in the implementation of the Buchberger algorithm, by using a special heap for temporary objects created during polynomial reduction. The solution drastically reduces the overall cost of memory allocation in the algorithm
Distilling the Real Cost of Production Garbage Collectors
Abridged abstract: despite the long history of garbage collection (GC) and
its prevalence in modern programming languages, there is surprisingly little
clarity about its true cost. Without understanding their cost, crucial
tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to
misguided design constraints and evaluation criteria used by GC researchers and
users, hindering the development of high-performance, low-cost GCs. In this
paper, we develop a methodology that allows us to empirically estimate the cost
of GC for any given set of metrics. By distilling out the explicitly
identifiable GC cost, we estimate the intrinsic application execution cost
using different GCs. The minimum distilled cost forms a baseline. Subtracting
this baseline from the total execution costs, we can then place an empirical
lower bound on the absolute costs of different GCs. Using this methodology, we
study five production GCs in OpenJDK 17, a high-performance Java runtime. We
measure the cost of these collectors, and expose their respective key
performance tradeoffs. We find that with a modestly sized heap, production GCs
incur substantial overheads across a diverse suite of modern benchmarks,
spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative
to the baseline cost. We show that these costs can be masked by concurrency and
generous provisioning of memory/compute. In addition, we find that newer
low-pause GCs are significantly more expensive than older GCs, and,
surprisingly, sometimes deliver worse application latency than stop-the-world
GCs. Our findings reaffirm that GC is by no means a solved problem and that a
low-cost, low-latency GC remains elusive. We recommend adopting the
distillation methodology together with a wider range of cost metrics for future
GC evaluations.Comment: Camera-ready versio
Local Variable Allocation For Accurate Garbage Collection of C++
Accurate garbage collection of C++ requires that every memory location and every register be known to contain either a pointer or a non-pointer. In order to minimize the run-time overhead of tagging memory locations and registers, techniques for partitioning memory and registers into separate classes dedicated independently to the representation of pointers and non-pointers respectively have been developed. This paper describes the implementation and performance of a specially designed activation frame targeted to the SPARC architecture, as implemented in a customized version of the GNU g++ compiler
Register Allocation for Accurate Garbage Collection of C++
Register Allocation for Accurate Garbage Collection of C++ S. Satishkumar* M.S. Creative Component Accurate garbage collection of C++ requires that every memory location and every register be known to contain either a pointer or a non-pointer. In order to minimize the run-time overhead of tagging memory locations and registers, techniques for partitioning memory and registers into separate classes dedicated independently to the representation of pointers and non-pointers respectively have been developed. This paper describes the implementation and performance of a specially designed register allocator for the GNU g++ compiler. * Portions of this paper were excerpted from Code Generation to Support Efficient Accurate Garbage Collection of C++ on Stock Hardware , a paper currently being prepared for publication by Kelvin Nilsen, Ravichandran Ganesan, Satish Guggilla, Satish Kumar, and Kannan Narasimha
Simulation of High-Performance Memory Allocators
This study presents a single-core and a multi-core processor architecture for health monitoring systems where slow biosignal events and highly parallel computations exist. The single-core architecture is composed of a processing core (PC), an instruction memory (IM) and a data memory (DM), while the multi-core architecture consists of PCs, individual IMs for each core, a shared DM and an interconnection crossbar between the cores and the DM. These architectures are compared with respect to power vs. performance trade-offs for a multi-lead electrocardiogram signal conditioning application exploiting near threshold computing. The results show that the multi-core solution consumes 66%less power for high computation requirements (50.1 MOps/s), whereas 10.4% more power for low computation needs (681 kOps/s)
Recommended from our members