265 research outputs found
Subheap-Augmented Garbage Collection
Automated memory management avoids the tedium and danger of manual techniques. However, as no programmer input is required, no widely available interface exists to permit principled control over sometimes unacceptable performance costs. This dissertation explores the idea that performance-oriented languages should give programmers greater control over where and when the garbage collector (GC) expends effort. We describe an interface and implementation to expose heap partitioning and collection decisions without compromising type safety. We show that our interface allows the programmer to encode a form of reference counting using Hayes\u27 notion of key objects. Preliminary experimental data suggests that our proposed mechanism can avoid high overheads suffered by tracing collectors in some scenarios, especially with tight heaps. However, for other applications, the costs of applying subheaps---in human effort and runtime overheads---remain daunting
Kindergarten Cop : dynamic nursery resizing for GHC
Generational garbage collectors are among the most popular garbage collectors used in programming language runtime systems. Their performance is known to depend heavily on choosing the appropriate size of the area where new objects are allocated (the nursery). In imperative languages, it is usual to make the nursery as large as possible, within the limits imposed by the heap size. Functional languages, however, have quite different memory behaviour. In this paper, we study the effect that the nursery size has on the performance of lazy functional programs, through the interplay between cache locality and the frequency of collections. We demonstrate that, in contrast with imperative programs, having large nurseries is not always the best solution. Based on these results, we propose two novel algorithms for dynamic nursery resizing that aim to achieve a compromise between good cache locality and the frequency of garbage collections. We present an implementation of these algorithms in the state-of-the-art GHC compiler for the functional language Haskell, and evaluate them using an extensive benchmark suite. In the best case, we demonstrate a reduction in total execution times of up to 88.5%, or an 8.7 overall speedup, compared to using the production GHC garbage collector. On average, our technique gives an improvement of 9.3% in overall performance across a standard suite of 63 benchmarks for the production GHC compiler.Postprin
Ramasse-miettes générationnel et incémental gérant les cycles et les gros objets en utilisant des frames délimités
Ces derniĂšres annĂ©es, des recherches ont Ă©tĂ© menĂ©es sur plusieurs techniques reliĂ©es Ă la collection des dĂ©chets. Plusieurs dĂ©couvertes centrales pour le ramassage de miettes par copie ont Ă©tĂ© rĂ©alisĂ©es. Cependant, des amĂ©liorations sont encore possibles. Dans ce mĂ©moire, nous introduisons des nouvelles techniques et de nouveaux algorithmes pour amĂ©liorer le ramassage de miettes. En particulier, nous introduisons une technique utilisant des cadres dĂ©limitĂ©s pour marquer et retracer les pointeurs racines. Cette technique permet un calcul efficace de l'ensemble des racines. Elle rĂ©utilise des concepts de deux techniques existantes, card marking et remembered sets, et utilise une configuration bidirectionelle des objets pour amĂ©liorer ces concepts en stabilisant le surplus de mĂ©moire utilisĂ©e et en rĂ©duisant la charge de travail lors du parcours des pointeurs. Nous prĂ©sentons aussi un algorithme pour marquer rĂ©cursivement les objets rejoignables sans utiliser de pile (Ă©liminant le gaspillage de mĂ©moire habituel). Nous adaptons cet algorithme pour implĂ©menter un ramasse-miettes copiant en profondeur et amĂ©liorer la localitĂ© du heap. Nous amĂ©liorons l'algorithme de collection des miettes older-first et sa version gĂ©nĂ©rationnelle en ajoutant une phase de marquage garantissant la collection de toutes les miettes, incluant les structures cycliques rĂ©parties sur plusieurs fenĂȘtres. Finalement, nous introduisons une technique pour gĂ©rer les gros objets. Pour tester nos idĂ©es, nous avons conçu et implĂ©mentĂ©, dans la machine virtuelle libre Java SableVM, un cadre de dĂ©veloppement portable et extensible pour la collection des miettes. Dans ce cadre, nous avons implĂ©mentĂ© des algorithmes de collection semi-space, older-first et generational. Nos expĂ©rimentations montrent que la technique du cadre dĂ©limitĂ© procure des performances compĂ©titives pour plusieurs benchmarks. Elles montrent aussi que, pour la plupart des benchmarks, notre algorithme de parcours en profondeur amĂ©liore la localitĂ© et augmente ainsi la performance. Nos mesures de la performance gĂ©nĂ©rale montrent que, utilisant nos techniques, un ramasse-miettes peut dĂ©livrer une performance compĂ©titive et surpasser celle des ramasses-miettes existants pour plusieurs benchmarks. ______________________________________________________________________________ MOTS-CLĂS DE LâAUTEUR : Ramasse-Miettes, Machine Virtuelle, Java, SableVM
Eliminating read barriers through procrastination and cleanliness
Managed languages use read barriers to interpret forwarding pointers introduced to keep track of copied objects. For example, in a split-heap managed runtime for a multicore environment, an object initially allocated on a local heap may be copied to a shared heap if it becomes the source of a store operation whose target location resides on the shared heap. As part of the copy operation, a forwarding pointer may be established to allow existing references to the local object to reference the copied version. In this paper, we consider the design of a managed runtime that avoids the need for read barriers. Our design is premised on the availability of a sufficient degree of concurrency to stall operations that would otherwise necessitate the copy. Stalled actions are deferred until the next local collection, avoiding exposing forwarding pointers to the mutator. In certain important cases, procrastination is unnecessary- lightweight runtime techniques can sometimes be used to allow objects to be eagerly copied when their set of incoming references is known, or when it can be determined that having multiple copies would not violate program semantics. Experimental results over a range of parallel benchmarks on a number of different architectural platforms including an 864 core Azul Vega 3, and a 48 core Intel SCC, indicate that our approach leads to notable performance gains (20- 32 % on average) without incurring any additional complexity
Coupling Memory and Computation for Locality Management
We articulate the need for managing (data) locality automatically rather than leaving it to the programmer, especially in parallel programming systems. To this end, we propose techniques for coupling tightly the computation (including the thread scheduler) and the memory manager so that data and computation can be positioned closely in hardware. Such tight coupling of computation and memory management is in sharp contrast with the prevailing practice of considering each in isolation. For example, memory-management techniques usually abstract the computation as an unknown "mutator", which is treated as a "black box". As an example of the approach, in this paper we consider a specific class of parallel computations, nested-parallel computations. Such computations dynamically create a nesting of parallel tasks. We propose a method for organizing memory as a tree of heaps reflecting the structure of the nesting. More specifically, our approach creates a heap for a task if it is separately scheduled on a processor. This allows us to couple garbage collection with the structure of the computation and the way in which it is dynamically scheduled on the processors. This coupling enables taking advantage of locality in the program by mapping it to the locality of the hardware. For example for improved locality a heap can be garbage collected immediately after its task finishes when the heap contents is likely in cache
Exploration of Dynamic Memory
Since the advent of the Java programming language and the development of real-time garbage collection, Java has become an option for implementing real-time applications. The memory management choices provided by real-time garbage collection allow for real-time eJava developers to spend more of their time implementing real-time solutions. Unfortunately, the real-time community is not convinced that real-time garbage collection works in managing memory for Java applications deployed in a real-time context. Consequently, the Real-Time for Java Expert Group formulated the Real-Time SpeciïŹcation for Java (RTSJ) standards to make Java a real-time programming language. In lieu of garbage collection, the RTSJ proposed a new memory model called scopes, and a new type of thread called NoHeapRealTimeThread (NHRT), which takes advantage of scopes. While scopes and NHRTs promise predictable allocation and deallocation behaviors, no asymptotic studies have been conducted to investigate the costs associated with these technologies. To understand the costs associated with using these technologies to manage memory, computations and analyses of time and space overheads associated with scopes and NHRTs are presented. These results provide a framework for comparing the RTSJâs memory management model with real-time garbage collection. Another facet of this research concerns the optimization of novel approaches to garbage collection on multiprocessor systems. Such approaches yield features that are suitable for real-time systems. Although multiprocessor, concurrent garbage collection is not the same as real-time garbage collection, advancements in multiprocessor concurrent garbage collection have demonstrated the feasibility of building low latency multiprocessor real-time garbage collectors. In the nineteen-sixties, only three garbage collection schemes were available, namely reference counting garbage collection, mark-sweep garbage collection, and copying garbage collection. These classical approaches gave new insight into the discipline of memory management and inspired researchers to develop new, more elaborate memory-management techniques. Those insights resulted in a plethora of automatic memory management algorithms and techniques, and a lack of uniformity in the language used to reason about garbage collection. To bring a sense of uniformity to the language used to reason about garbage collection technologies, a taxonomy for comparing garbage collection technologies is presented
Recommended from our members
Optimization and Training of Generational Garbage Collectors
Garbage collectors are nearly ubiquitous in modern programming languages, and we want to minimize the cost they impose in terms of time and space. Generally, a collector waits until its space is full and then performs a collection to reclaim needed memory. However, this is not the only option; a collection could be performed early when some free space remains. For copying collectors, which are what we consider here, the system must traverse the graph of live objects and copy them, so the cost of a collection is proportional to the volume of objects that are live. Since this value fluctuates during a program\u27s execution, a collector can minimize its cost by carefully choosing the points at which it collects.
We help to realize this goal in two ways. First, by developing an algorithm that analyzes after-the-fact traces and computes optimal collection points, we can explore the theoretical limits of garbage collectors. This gives insight into what performance gains are possible, and can guide future collector development into areas that could be most fruitful.
Second, we use techniques from machine learning to find improved garbage collection policies that could be implemented in real systems. The optimal collection points provide ground truth from which a model can learn
High Performance Reference Counting and Conservative Garbage Collection
Garbage collection is an integral part of modern programming languages. It automatically
reclaims memory occupied by objects that are no longer in use. Garbage
collection began in 1960 with two algorithmic branches â tracing and reference counting.
Tracing identifies live objects by performing a transitive closure over the object
graph starting with the stacks, registers, and global variables as roots. Objects not
reached by the trace are implicitly dead, so the collector reclaims them. In contrast,
reference counting explicitly identifies dead objects by counting the number of incoming
references to each object. When an objectâs count goes to zero, it is unreachable
and the collector may reclaim it.
Garbage collectors require knowledge of every reference to each object, whether
the reference is from another object or from within the runtime. The runtime provides
this knowledge either by continuously keeping track of every change to each reference
or by periodically enumerating all references. The collector implementation faces two
broad choices â exact and conservative. In exact garbage collection, the compiler and
runtime system precisely identify all references held within the runtime including
those held within stacks, registers, and objects. To exactly identify references, the
runtime must introspect these references during execution, which requires support
from the compiler and significant engineering effort. On the contrary, conservative
garbage collection does not require introspection of these references, but instead
treats each value ambiguously as a potential reference.
Highly engineered, high performance systems conventionally use tracing and
exact garbage collection. However, other well-established but less performant systems
use either reference counting or conservative garbage collection. Reference counting has
some advantages over tracing such as: a) it is easier implement, b) it reclaims memory
immediately, and c) it has a local scope of operation. Conservative garbage collection
is easier to implement compared to exact garbage collection because it does not
require compiler cooperation. Because of these advantages, both reference counting
and conservative garbage collection are widely used in practice. Because both suffer
significant performance overheads, they are generally not used in performance critical
settings. This dissertation carefully examines reference counting and conservative
garbage collection to understand their behavior and improve their performance.
My thesis is that reference counting and conservative garbage collection can perform
as well or better than the best performing garbage collectors.
The key contributions of my thesis are: 1) An in-depth analysis of the key design
choices for reference counting. 2) Novel optimizations guided by that analysis that
significantly improve reference counting performance and make it competitive with
a well tuned tracing garbage collector. 3) A new collector, RCImmix, that replaces
the traditional free-list heap organization of reference counting with a line and block heap structure, which improves locality, and adds copying to mitigate fragmentation.
The result is a collector that outperforms a highly tuned production generational
collector. 4) A conservative garbage collector based on RCImmix that matches the
performance of a highly tuned production generational collector.
Reference counting and conservative garbage collection have lived under the
shadow of tracing and exact garbage collection for a long time. My thesis focuses
on bringing these somewhat neglected branches of garbage collection back to life
in a high performance setting and leads to two very surprising results: 1) a new
garbage collector based on reference counting that outperforms a highly tuned production
generational tracing collector, and 2) a variant that delivers high performance
conservative garbage collection
- âŠ