181 research outputs found

    Eliminating read barriers through procrastination and cleanliness

    Get PDF
    Managed languages use read barriers to interpret forwarding pointers introduced to keep track of copied objects. For example, in a split-heap managed runtime for a multicore environment, an object initially allocated on a local heap may be copied to a shared heap if it becomes the source of a store operation whose target location resides on the shared heap. As part of the copy operation, a forwarding pointer may be established to allow existing references to the local object to reference the copied version. In this paper, we consider the design of a managed runtime that avoids the need for read barriers. Our design is premised on the availability of a sufficient degree of concurrency to stall operations that would otherwise necessitate the copy. Stalled actions are deferred until the next local collection, avoiding exposing forwarding pointers to the mutator. In certain important cases, procrastination is unnecessary- lightweight runtime techniques can sometimes be used to allow objects to be eagerly copied when their set of incoming references is known, or when it can be determined that having multiple copies would not violate program semantics. Experimental results over a range of parallel benchmarks on a number of different architectural platforms including an 864 core Azul Vega 3, and a 48 core Intel SCC, indicate that our approach leads to notable performance gains (20- 32 % on average) without incurring any additional complexity

    High-level real-time programming in Java

    Full text link
    Real-time systems have reached a level of complexity beyond the scaling capability of the low-level or restricted languages traditionally used for real-time programming. While Metronome garbage collection has made it practical to use Java to implement real-time systems, many challenges remain for the construction of complex real-time systems, some specic to the use of Java and others simply due to the change in scale of such systems. The goal of our research is the creation of a comprehensive Java-based programming environment and methodology for the creation of complex real-time systems. Our goals include construction of a provably correct real-time garbage collec-tor capable of providing worst case latencies of 100 s, capa-ble of scaling from sensor nodes up to large multiprocessors; specialized programming constructs that retain the safety and simplicity of Java, and yet provide sub-microsecond la-tencies; the extension of Java's \write once, run anywhere" principle from functional correctness to timing behavior; on-line analysis and visualization that aids in the understanding of complex behaviors; and a principled probabilistic analy-sis methodology for bounding the behavior of the resulting systems. While much remains to be done, this paper describes the progress we have made towards these goals

    A concurrent, generational garbage collector for a multithreaded implementation of ML

    Get PDF
    International audienceThis paper presents the design and implementation of a "quasi real-time" garbage collector for Concurrent Caml Light, an implementation of ML with threads. This two-generation system combines a fast, asynchronous copying collector on the young generation with a non-disruptive concurrent marking collector on the old generation. This design crucially relies on the ML compile-time distinction between mutable and immutable objects

    Orca: GC and type system co-design for actor languages

    No full text
    ORCA is a concurrent and parallel garbage collector for actor programs, which does not require any stop-the-world steps, or synchronisation mechanisms, and which has been designed to support zero-copy message passing and sharing of mutable data. \ORCA is part of the runtime of the actor-based language Pony. Pony's runtime was co-designed with the Pony language. This co-design allowed us to exploit certain language properties in order to optimise performance of garbage collection. Namely, ORCA relies on the absence of race conditions in order to avoid read/write barriers, and it leverages actor message passing for synchronisation among actors. This paper describes Pony, its type system, and the ORCA garbage collection algorithm. An evaluation of the performance of ORCA suggests that it is fast and scalable for idiomatic workloads

    Hierarchical Memory Management for Parallel Programs

    Get PDF
    International audienceAn important feature of functional programs is that they are parallel by default. Implementing an efficient parallel functional language, however, is a major challenge, in part because the high rate of allocation and freeing associated with functional programs requires an efficient and scalable memory manager. In this paper, we present a technique for parallel memory management for strict functional languages with nested parallelism. At the highest level of abstraction, the approach consists of a technique to organize memory as a hierarchy of heaps, and an algorithm for performing automatic memory reclamation by taking advantage of a disentanglement property of parallel functional programs. More specifically, the idea is to assign to each parallel task its own heap in memory and organize the heaps in a hierarchy/tree that mirrors the hierarchy of tasks. We present a nested-parallel calculus that specifies hierarchical heaps and prove in this calculus a disentanglement property, which prohibits a task from accessing objects allocated by another task that might execute in parallel. Leveraging the disentanglement property, we present a garbage collection technique that can operate on any subtree in the memory hierarchy concurrently as other tasks (and/or other collections) proceed in parallel. We prove the safety of this collector by formalizing it in the context of our parallel calculus. In addition, we describe how the proposed techniques can be implemented on modern shared-memory machines and present a prototype implementation as an extension to MLton, a high-performance compiler for the Standard ML language. Finally, we evaluate the performance of this implementation on a number of parallel benchmarks

    Effective information sharing using update logs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (leaves 80-88).by James William O'Toole, Jr.Ph.D
    corecore