Search CORE

323 research outputs found

Ramasse-miettes générationnel et incémental gérant les cycles et les gros objets en utilisant des frames délimités

Author: Adam Sébastien
Publication venue
Publication date: 01/01/2008
Field of study

Ces dernières années, des recherches ont été menées sur plusieurs techniques reliées à la collection des déchets. Plusieurs découvertes centrales pour le ramassage de miettes par copie ont été réalisées. Cependant, des améliorations sont encore possibles. Dans ce mémoire, nous introduisons des nouvelles techniques et de nouveaux algorithmes pour améliorer le ramassage de miettes. En particulier, nous introduisons une technique utilisant des cadres délimités pour marquer et retracer les pointeurs racines. Cette technique permet un calcul efficace de l'ensemble des racines. Elle réutilise des concepts de deux techniques existantes, card marking et remembered sets, et utilise une configuration bidirectionelle des objets pour améliorer ces concepts en stabilisant le surplus de mémoire utilisée et en réduisant la charge de travail lors du parcours des pointeurs. Nous présentons aussi un algorithme pour marquer récursivement les objets rejoignables sans utiliser de pile (éliminant le gaspillage de mémoire habituel). Nous adaptons cet algorithme pour implémenter un ramasse-miettes copiant en profondeur et améliorer la localité du heap. Nous améliorons l'algorithme de collection des miettes older-first et sa version générationnelle en ajoutant une phase de marquage garantissant la collection de toutes les miettes, incluant les structures cycliques réparties sur plusieurs fenêtres. Finalement, nous introduisons une technique pour gérer les gros objets. Pour tester nos idées, nous avons conçu et implémenté, dans la machine virtuelle libre Java SableVM, un cadre de développement portable et extensible pour la collection des miettes. Dans ce cadre, nous avons implémenté des algorithmes de collection semi-space, older-first et generational. Nos expérimentations montrent que la technique du cadre délimité procure des performances compétitives pour plusieurs benchmarks. Elles montrent aussi que, pour la plupart des benchmarks, notre algorithme de parcours en profondeur améliore la localité et augmente ainsi la performance. Nos mesures de la performance générale montrent que, utilisant nos techniques, un ramasse-miettes peut délivrer une performance compétitive et surpasser celle des ramasses-miettes existants pour plusieurs benchmarks. ______________________________________________________________________________ MOTS-CLÉS DE L’AUTEUR : Ramasse-Miettes, Machine Virtuelle, Java, SableVM

Archipel - Université du Québec à Montréal

Pretenuring for Java

Author: Asjad Khan
B. Moss
J. Eliot
John Cavazos
Kathryn S. Mckinley
Sara Smolensky
Sharad Singhai
Stephen M. Blackburn
Publication venue: ACM Press
Publication date: 01/01/2001
Field of study

Pretenuring is a technique for reducing copying costs in garbage collectors. When pretenuring, the allocator places long-lived objects into regions that the garbage collector will rarely, if ever, collect. We extend previous work on profiling-driven pretenuring as follows. (1) We develop a collector-neutral approach to obtaining object lifetime profile information. We show that our collection of Java programs exhibits a very high degree of homogeneity of object lifetimes at each allocation site. This result is robust with respect to different inputs, and is similar to previous work on ML, but is in contrast to C programs, which require dynamic call chain context information to extract homogeneous lifetimes. Call-site homogeneity considerably simplifies the implementation of pretenuring and makes it more efficient. (2) Our pretenuring advice is neutral with respect to the collector algorithm, and we use it to improve two quite different garbage collectors: a traditional generational collector and an older-first collector. The system is also novel because it classifies and allocates objects into 3 categories: we allocate immortal objects into a permanent region that the collector will never consider, long-lived objects into a region in which the collector placed survivors of the most recent collection, and shortlived objects into the nursery, i.e., the default region. (3) We evaluate pretenuring on Java programs. Our simulation results show that pretenuring significantly reduces collector copying for generational and older-first collectors. 1

CiteSeerX

Subheap-Augmented Garbage Collection

Author: Karel Benjamin
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

Automated memory management avoids the tedium and danger of manual techniques. However, as no programmer input is required, no widely available interface exists to permit principled control over sometimes unacceptable performance costs. This dissertation explores the idea that performance-oriented languages should give programmers greater control over where and when the garbage collector (GC) expends effort. We describe an interface and implementation to expose heap partitioning and collection decisions without compromising type safety. We show that our interface allows the programmer to encode a form of reference counting using Hayes\u27 notion of key objects. Preliminary experimental data suggests that our proposed mechanism can avoid high overheads suffered by tracing collectors in some scenarios, especially with tight heaps. However, for other applications, the costs of applying subheaps---in human effort and runtime overheads---remain daunting

ScholarlyCommons@Penn

HALO: Post-Link Heap-Layout Optimisation

Author: Berger Emery D.
Berger Emery D.
Calder Brad
Chilimbi Trishul M.
Chilimbi Trishul M.
David
Evans Jason
Evans Jason
Leijen Daan
Matthew
Nevill-Manning C. G.
Newman M. E. J.
Powers Bobby
Standard Performance Evaluation Corporation
Standard Performance Evaluation Corporation
Trishul
Truong D. N.
Publication venue: CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
Publication date: 01/02/2020
Field of study

Today, general-purpose memory allocators dominate the landscape of dynamic memory management. While these so- lutions can provide reasonably good behaviour across a wide range of workloads, it is an unfortunate reality that their behaviour for any particular workload can be highly suboptimal. By catering primarily to average and worst-case usage patterns, these allocators deny programs the advantages of domain-specific optimisations, and thus may inadvertently place data in a manner that hinders performance, generating unnecessary cache misses and load stalls. To help alleviate these issues, we propose HALO: a post-link profile-guided optimisation tool that can improve the layout of heap data to reduce cache misses automatically. Profiling the target binary to understand how allocations made in different contexts are related, we specialise memory-management routines to allocate groups of related objects from separate pools to increase their spatial locality. Unlike other solutions of its kind, HALO employs novel grouping and identification algorithms which allow it to create tight-knit allocation groups using the entire call stack and to identify these efficiently at runtime. Evaluation of HALO on contemporary out-of-order hardware demonstrates speedups of up to 28% over jemalloc, out-performing a state-of-the-art data placement technique from the literature

Crossref

Apollo (Cambridge)

Visualizing time-dependent software artifacts

Author: Moreta S.
Publication venue
Publication date: 31/08/2011
Field of study

Pure OAI Repository

Using Class-Level Static Properties to Predict Object Lifetimes

Author: Nicolas Marion Sebastien
Publication venue
Publication date: 12/07/2023
Field of study

Today, most modern programming languages such as C # or Java use an automatic memory management system also known as a Garbage Collector (GC). Over the course of program execution, new objects are allocated in memory, and some older objects become unreachable (die). In order for the program to keep running, it becomes necessary to free the memory of dead objects; this task is performed periodically by the GC. Research has shown that most objects die young and as a result, generational collectors have become very popular over the years. Yet, these algorithms are not good at handling long-lived objects. Typically, long-lived objects would first be allocated in the nursery space and be promoted (copied) to an older generation after surviving a garbage collection, hence wasting precious time. By allocating long-lived and immortal objects directly into infrequently or never collected regions, pretenuring can reduce garbage collection costs significantly. Current state of the art methodology to predict object lifetime involves off-line profiling combined with a simple, heuristic classification. Profiling is slow (can take days), requires gathering gigabytes of data that need to be analysed (can take hours), and needs to be repeated for every previously unseen program. This thesis explores the space of lifetime predictions and shows how object lifetimes can be predicted accurately and quickly using simple program characteristics gathered within minutes. Following an innovative methodology introduced in this thesis, object lifetime predictions are fed into a specifically modified Java virtual machine. Performance tests show gains in GC times of as much as 77% for the “SPEC jvm98” benchmarks, against a generational copying collector

Kent Academic Repository

Characterization and reduction of memory usage in 64-bit Java Virtual Machines

Author: Venstermans Kris
Publication venue: Ghent University. Faculty of Engineering
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

USING HARDWARE MONITORS TO AUTOMATICALLY IMPROVE MEMORY PERFORMANCE

Author: Tikir Mustafa Murat
Publication venue
Publication date: 19/09/2005
Field of study

In this thesis, we propose and evaluate several techniques to dynamically increase the memory access locality of scientific and Java server applications running on cache-coherent non-uniform memory access(cc-NUMA) servers. We first introduce a user-level online page migration scheme where applications are profiled using hardware monitors to determine the preferred locations of the memory pages. The pages are then migrated to memory units via system calls. In our approach, both profiling and page migrations are conducted online while the application runs. We also investigate the use of several potential sources of profiles gathered from hardware monitors in dynamic page migration and compare their effectiveness to using profiles from centralized hardware monitors. In particular, we evaluate using profiles from on-chip CPU monitors, valid TLB content and a hypothetical hardware feature. We also introduce a set of techniques to both measure and optimize the memory access locality in Java server applications running on cc-NUMA servers. In particular, we propose the use of several NUMA-aware Java heap layouts for initial object allocation and use of dynamic object migration during garbage collection to move objects local to the processors accessing them most. To evaluate these techniques, we also introduce a new hybrid simulation approach to simulate memory behavior of parallel applications based on gathering a partial trace of memory accesses from hardware monitors during an actual run of an application and extrapolating it to a representative full trace. Our dynamic page migration approach achieved reductions up to 90% in the number of non-local accesses, which resulted in up to a 16% performance improvement. Our results demonstrated that the combinations of inexpensive hardware monitors and a simple migration policy can be effectively used to improve the performance of real scientific applications. Our simulation study demonstrated that cache miss profiles gathered from on-chip hardware monitors, which are typically available in current micro-processors, can be effectively used to guide dynamic page migrations in an application. Our NUMA-aware heap layouts reduced the total number of non-local object accesses in SPECjbb2000 up to 41%, which resulted in up to a 40% reduction in the memory wait time of the workload

Digital Repository at the University of Maryland