779 research outputs found
Oil and Water? High Performance Garbage Collection in Java with MMTk
Increasingly popular languages such as Java and C # require efficient garbage collection. This paper presents the design, implementation, and evaluation of MMTk, a Memory Management Toolkit for and in Java. MMTk is an efficient, composable, extensible, and portable framework for building garbage collectors. MMTk uses design patterns and compiler cooperation to combine modularity and efficiency. The resulting system is more robust, easier to maintain, and has fewer defects than monolithic collectors. Experimental comparisons with monolithic Java and C implementations reveal MMTk has significant performance advantages as well. Performance critical system software typically uses monolithic C at the expense of flexibility. Our results refute common wisdom that only this approach attains efficiency, and suggest that performance critical software can embrace modular design and high-level languages.
Beltway: Getting Around Garbage Collection Gridlock
We present the design and implementation of a new garbage collection framework that significantly generalizes existing copying collectors. The Beltway framework exploits and separates object age and incrementality. It groups objects in one or more increments on queues called belts, collects belts independently, and collects increments on a belt in first-in-first-out order. We show that Beltway configurations, selected by command line options, act and perform the same as semi-space, generational, and older-first collectors, and encompass all previous copying collectors of which we are aware. The increasing reliance on garbage collected languages such as Java requires that the collector perform well. We show that the generality of Beltway enables us to design and implement new collectors that are robust to variations in heap size and improve total execution time over the best generational copying collectors of which we are aware by up to 40%, and on average by 5 to 10%, for small to moderate heap sizes. New garbage collection algorithms are rare, and yet we define not just one, but a new family of collectors that subsumes previous work. This generality enables us to explore a larger design space and build better collectors
Distilling the Real Cost of Production Garbage Collectors
Abridged abstract: despite the long history of garbage collection (GC) and
its prevalence in modern programming languages, there is surprisingly little
clarity about its true cost. Without understanding their cost, crucial
tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to
misguided design constraints and evaluation criteria used by GC researchers and
users, hindering the development of high-performance, low-cost GCs. In this
paper, we develop a methodology that allows us to empirically estimate the cost
of GC for any given set of metrics. By distilling out the explicitly
identifiable GC cost, we estimate the intrinsic application execution cost
using different GCs. The minimum distilled cost forms a baseline. Subtracting
this baseline from the total execution costs, we can then place an empirical
lower bound on the absolute costs of different GCs. Using this methodology, we
study five production GCs in OpenJDK 17, a high-performance Java runtime. We
measure the cost of these collectors, and expose their respective key
performance tradeoffs. We find that with a modestly sized heap, production GCs
incur substantial overheads across a diverse suite of modern benchmarks,
spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative
to the baseline cost. We show that these costs can be masked by concurrency and
generous provisioning of memory/compute. In addition, we find that newer
low-pause GCs are significantly more expensive than older GCs, and,
surprisingly, sometimes deliver worse application latency than stop-the-world
GCs. Our findings reaffirm that GC is by no means a solved problem and that a
low-cost, low-latency GC remains elusive. We recommend adopting the
distillation methodology together with a wider range of cost metrics for future
GC evaluations.Comment: Camera-ready versio
Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development
Many of today\u27s programming languages are broken. Poor performance, lack of features and hard-to-reason-about semantics can cost dearly in software maintenance and inefficient execution. The problem is only getting worse with programming languages proliferating and hardware becoming more complicated. An important reason for this brokenness is that much of language design is implementation-driven. The difficulties in implementation and insufficient understanding of concepts bake bad designs into the language itself. Concurrency, architectural details and garbage collection are three fundamental concerns that contribute much to the complexities of implementing managed languages.
We propose the micro virtual machine, a thin abstraction designed specifically to relieve implementers of managed languages of the most fundamental implementation challenges that currently impede good design. The micro virtual machine targets abstractions over memory (garbage collection), architecture (compiler backend), and concurrency. We motivate the micro virtual machine and give an account of the design and initial experience of a concrete instance, which we call Mu, built over a two year period. Our goal is to remove an important barrier to performant and semantically sound managed language design and implementation
Cooperative cache scrubbing
Managing the limited resources of power and memory bandwidth while improving performance on multicore hardware is challeng-ing. In particular, more cores demand more memory bandwidth, and multi-threaded applications increasingly stress memory sys-tems, leading to more energy consumption. However, we demon-strate that not all memory traffic is necessary. For modern Java pro-grams, 10 to 60 % of DRAM writes are useless, because the data on these lines are dead- the program is guaranteed to never read them again. Furthermore, reading memory only to immediately zero ini-tialize it wastes bandwidth. We propose a software/hardware coop-erative solution: the memory manager communicates dead and zero lines with cache scrubbing instructions. We show how scrubbing instructions satisfy MESI cache coherence protocol invariants and demonstrate them in a Java Virtual Machine and multicore simula-tor. Scrubbing reduces average DRAM traffic by 59%, total DRAM energy by 14%, and dynamic DRAM energy by 57 % on a range of configurations. Cooperative software/hardware cache scrubbing reduces memory bandwidth and improves energy efficiency, two critical problems in modern systems
Using remote substituents to control solution structure and anion binding in lanthanide complexes.
A study of the anion-binding properties of three structurally related lanthanide complexes, which all contain chemically identical anion-binding motifs, has revealed dramatic differences in their anion affinity. These arise as a consequence of changes in the substitution pattern on the periphery of the molecule, at a substantial distance from the binding pocket. Herein, we explore these remote substituent effects and explain the observed behaviour through discussion of the way in which remote substituents can influence and control the global structure of a molecule through their demands upon conformational space. Peripheral modifications to a binuclear lanthanide motif derived from α,α′-bis(DO3 Ayl)-m-xylene are shown to result in dramatic changes to the binding constant for isophthalate. In this system, the parent compound displays considerable conformational flexibility, yet can be assumed to bind to isophthalate through a well-defined conformer. Addition of steric bulk remote from the binding site restricts conformational mobility, giving rise to an increase in binding constant on entropic grounds as long as the ideal binding conformation is not excluded from the available range of conformers
Pretenuring for Java
Pretenuring is a technique for reducing copying costs in garbage collectors. When pretenuring, the allocator places long-lived objects into regions that the garbage collector will rarely, if ever, collect. We extend previous work on profiling-driven pretenuring as follows. (1) We develop a collector-neutral approach to obtaining object lifetime profile information. We show that our collection of Java programs exhibits a very high degree of homogeneity of object lifetimes at each allocation site. This result is robust with respect to different inputs, and is similar to previous work on ML, but is in contrast to C programs, which require dynamic call chain context information to extract homogeneous lifetimes. Call-site homogeneity considerably simplifies the implementation of pretenuring and makes it more efficient. (2) Our pretenuring advice is neutral with respect to the collector algorithm, and we use it to improve two quite different garbage collectors: a traditional generational collector and an older-first collector. The system is also novel because it classifies and allocates objects into 3 categories: we allocate immortal objects into a permanent region that the collector will never consider, long-lived objects into a region in which the collector placed survivors of the most recent collection, and shortlived objects into the nursery, i.e., the default region. (3) We evaluate pretenuring on Java programs. Our simulation results show that pretenuring significantly reduces collector copying for generational and older-first collectors. 1
- …