5,216 research outputs found

    On Benchmarking Embedded Linux Flash File Systems

    Full text link
    Due to its attractive characteristics in terms of performance, weight and power consumption, NAND flash memory became the main non volatile memory (NVM) in embedded systems. Those NVMs also present some specific characteristics/constraints: good but asymmetric I/O performance, limited lifetime, write/erase granularity asymmetry, etc. Those peculiarities are either managed in hardware for flash disks (SSDs, SD cards, USB sticks, etc.) or in software for raw embedded flash chips. When managed in software, flash algorithms and structures are implemented in a specific flash file system (FFS). In this paper, we present a performance study of the most widely used FFSs in embedded Linux: JFFS2, UBIFS,and YAFFS. We show some very particular behaviors and large performance disparities for tested FFS operations such as mounting, copying, and searching file trees, compression, etc.Comment: Embed With Linux, Lorient : France (2012

    Decrypting The Java Gene Pool: Predicting Objects' Lifetimes with Micro-patterns

    Get PDF
    Pretenuring long-lived and immortal objects into infrequently or never collected regions reduces garbage collection costs significantly. However, extant approaches either require computationally expensive, application-specific, off-line profiling, or consider only allocation sites common to all programs, i.e. invoked by the virtual machine rather than application programs. In contrast, we show how a simple program analysis, combined with an object lifetime knowledge bank, can be exploited to match both runtime system and application program structure with object lifetimes. The complexity of the analysis is linear in the size of the program, so need not be run ahead of time. We obtain performance gains between 6-77% in GC time against a generational copying collector for several SPEC jvm98 programs

    Prioritized Garbage Collection: Explicit GC Support for Software Caches

    Full text link
    Programmers routinely trade space for time to increase performance, often in the form of caching or memoization. In managed languages like Java or JavaScript, however, this space-time tradeoff is complex. Using more space translates into higher garbage collection costs, especially at the limit of available memory. Existing runtime systems provide limited support for space-sensitive algorithms, forcing programmers into difficult and often brittle choices about provisioning. This paper presents prioritized garbage collection, a cooperative programming language and runtime solution to this problem. Prioritized GC provides an interface similar to soft references, called priority references, which identify objects that the collector can reclaim eagerly if necessary. The key difference is an API for defining the policy that governs when priority references are cleared and in what order. Application code specifies a priority value for each reference and a target memory bound. The collector reclaims references, lowest priority first, until the total memory footprint of the cache fits within the bound. We use this API to implement a space-aware least-recently-used (LRU) cache, called a Sache, that is a drop-in replacement for existing caches, such as Google's Guava library. The garbage collector automatically grows and shrinks the Sache in response to available memory and workload with minimal provisioning information from the programmer. Using a Sache, it is almost impossible for an application to experience a memory leak, memory pressure, or an out-of-memory crash caused by software caching.Comment: to appear in OOPSLA 201

    Threads and Or-Parallelism Unified

    Full text link
    One of the main advantages of Logic Programming (LP) is that it provides an excellent framework for the parallel execution of programs. In this work we investigate novel techniques to efficiently exploit parallelism from real-world applications in low cost multi-core architectures. To achieve these goals, we revive and redesign the YapOr system to exploit or-parallelism based on a multi-threaded implementation. Our new approach takes full advantage of the state-of-the-art fast and optimized YAP Prolog engine and shares the underlying execution environment, scheduler and most of the data structures used to support YapOr's model. Initial experiments with our new approach consistently achieve almost linear speedups for most of the applications, proving itself as a good alternative for exploiting implicit parallelism in the currently available low cost multi-core architectures.Comment: 17 pages, 21 figures, International Conference on Logic Programming (ICLP 2010

    Ramasse-miettes générationnel et incémental gérant les cycles et les gros objets en utilisant des frames délimités

    Get PDF
    Ces derniĂšres annĂ©es, des recherches ont Ă©tĂ© menĂ©es sur plusieurs techniques reliĂ©es Ă  la collection des dĂ©chets. Plusieurs dĂ©couvertes centrales pour le ramassage de miettes par copie ont Ă©tĂ© rĂ©alisĂ©es. Cependant, des amĂ©liorations sont encore possibles. Dans ce mĂ©moire, nous introduisons des nouvelles techniques et de nouveaux algorithmes pour amĂ©liorer le ramassage de miettes. En particulier, nous introduisons une technique utilisant des cadres dĂ©limitĂ©s pour marquer et retracer les pointeurs racines. Cette technique permet un calcul efficace de l'ensemble des racines. Elle rĂ©utilise des concepts de deux techniques existantes, card marking et remembered sets, et utilise une configuration bidirectionelle des objets pour amĂ©liorer ces concepts en stabilisant le surplus de mĂ©moire utilisĂ©e et en rĂ©duisant la charge de travail lors du parcours des pointeurs. Nous prĂ©sentons aussi un algorithme pour marquer rĂ©cursivement les objets rejoignables sans utiliser de pile (Ă©liminant le gaspillage de mĂ©moire habituel). Nous adaptons cet algorithme pour implĂ©menter un ramasse-miettes copiant en profondeur et amĂ©liorer la localitĂ© du heap. Nous amĂ©liorons l'algorithme de collection des miettes older-first et sa version gĂ©nĂ©rationnelle en ajoutant une phase de marquage garantissant la collection de toutes les miettes, incluant les structures cycliques rĂ©parties sur plusieurs fenĂȘtres. Finalement, nous introduisons une technique pour gĂ©rer les gros objets. Pour tester nos idĂ©es, nous avons conçu et implĂ©mentĂ©, dans la machine virtuelle libre Java SableVM, un cadre de dĂ©veloppement portable et extensible pour la collection des miettes. Dans ce cadre, nous avons implĂ©mentĂ© des algorithmes de collection semi-space, older-first et generational. Nos expĂ©rimentations montrent que la technique du cadre dĂ©limitĂ© procure des performances compĂ©titives pour plusieurs benchmarks. Elles montrent aussi que, pour la plupart des benchmarks, notre algorithme de parcours en profondeur amĂ©liore la localitĂ© et augmente ainsi la performance. Nos mesures de la performance gĂ©nĂ©rale montrent que, utilisant nos techniques, un ramasse-miettes peut dĂ©livrer une performance compĂ©titive et surpasser celle des ramasses-miettes existants pour plusieurs benchmarks. ______________________________________________________________________________ MOTS-CLÉS DE L’AUTEUR : Ramasse-Miettes, Machine Virtuelle, Java, SableVM

    Distilling the Real Cost of Production Garbage Collectors

    Get PDF
    Abridged abstract: despite the long history of garbage collection (GC) and its prevalence in modern programming languages, there is surprisingly little clarity about its true cost. Without understanding their cost, crucial tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to misguided design constraints and evaluation criteria used by GC researchers and users, hindering the development of high-performance, low-cost GCs. In this paper, we develop a methodology that allows us to empirically estimate the cost of GC for any given set of metrics. By distilling out the explicitly identifiable GC cost, we estimate the intrinsic application execution cost using different GCs. The minimum distilled cost forms a baseline. Subtracting this baseline from the total execution costs, we can then place an empirical lower bound on the absolute costs of different GCs. Using this methodology, we study five production GCs in OpenJDK 17, a high-performance Java runtime. We measure the cost of these collectors, and expose their respective key performance tradeoffs. We find that with a modestly sized heap, production GCs incur substantial overheads across a diverse suite of modern benchmarks, spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative to the baseline cost. We show that these costs can be masked by concurrency and generous provisioning of memory/compute. In addition, we find that newer low-pause GCs are significantly more expensive than older GCs, and, surprisingly, sometimes deliver worse application latency than stop-the-world GCs. Our findings reaffirm that GC is by no means a solved problem and that a low-cost, low-latency GC remains elusive. We recommend adopting the distillation methodology together with a wider range of cost metrics for future GC evaluations.Comment: Camera-ready versio

    Region-based memory management for Mercury programs

    Full text link
    Region-based memory management (RBMM) is a form of compile time memory management, well-known from the functional programming world. In this paper we describe our work on implementing RBMM for the logic programming language Mercury. One interesting point about Mercury is that it is designed with strong type, mode, and determinism systems. These systems not only provide Mercury programmers with several direct software engineering benefits, such as self-documenting code and clear program logic, but also give language implementors a large amount of information that is useful for program analyses. In this work, we make use of this information to develop program analyses that determine the distribution of data into regions and transform Mercury programs by inserting into them the necessary region operations. We prove the correctness of our program analyses and transformation. To execute the annotated programs, we have implemented runtime support that tackles the two main challenges posed by backtracking. First, backtracking can require regions removed during forward execution to be "resurrected"; and second, any memory allocated during a computation that has been backtracked over must be recovered promptly and without waiting for the regions involved to come to the end of their life. We describe in detail our solution of both these problems. We study in detail how our RBMM system performs on a selection of benchmark programs, including some well-known difficult cases for RBMM. Even with these difficult cases, our RBMM-enabled Mercury system obtains clearly faster runtimes for 15 out of 18 benchmarks compared to the base Mercury system with its Boehm runtime garbage collector, with an average runtime speedup of 24%, and an average reduction in memory requirements of 95%. In fact, our system achieves optimal memory consumption in some programs.Comment: 74 pages, 23 figures, 11 tables. A shorter version of this paper, without proofs, is to appear in the journal Theory and Practice of Logic Programming (TPLP

    Control theory for principled heap sizing

    Get PDF
    We propose a new, principled approach to adaptive heap sizing based on control theory. We review current state-of-the-art heap sizing mechanisms, as deployed in Jikes RVM and HotSpot. We then formulate heap sizing as a control problem, apply and tune a standard controller algorithm, and evaluate its performance on a set of well-known benchmarks. We find our controller adapts the heap size more responsively than existing mechanisms. This responsiveness allows tighter virtual machine memory footprints while preserving target application throughput, which is ideal for both embedded and utility computing domains. In short, we argue that formal, systematic approaches to memory management should be replacing ad-hoc heuristics as the discipline matures. Control-theoretic heap sizing is one such systematic approach
    • 

    corecore