58 research outputs found

    Prefetching in functional languages

    Get PDF
    Functional programming languages contain a number of runtime and language features, such as garbage collection, indirect memory accesses, linked data structures and immutability, that interact with a processor’s memory system. These conspire to cause a variety of unintuitive memory performance effects. For example, it is slower to traverse through linked lists and arrays of data that have been sorted than to traverse the same data accessed in the order it was allocated. We seek to understand these issues and mitigate them in a manner consistent with functional languages, taking advantage of the features themselves where possible. For example, immutability and garbage collection force linked lists to be allocated roughly sequentially in memory, even when the data pointed to within each node is not. We add language primitives for software-prefetching to the OCaml language to exploit this, and observe significant performance improvements a variety of micro- and macro-benchmarks, resulting in speedups of up to 2× on the out-of-order superscalar Intel Haswell and Xeon Phi Knights Landing systems, and up to 3× on the in-order Arm Cortex-A53.Arm Limite

    The design and construction of high performance garbage collectors

    Get PDF
    Garbage collection is a performance-critical component of modern language implementations. The performance of a garbage collector depends in part on major algorithmic decisions, but also significantly on implementation details and techniques which are often incidental in the literature. In this dissertation I look in detail at the performance characteristics of garbage collection on modern architectures. My thesis is that a thorough understanding of the characteristics of the heap to be collected, coupled with measured performance of various design alternatives on a range of modern architectures provides insights that can be used to improve the performance of any garbage collection algorithm. The key contributions of this work are: 1) A new analysis technique (replay collection) for measuring the performance of garbage collection algorithms; 2) a novel technique for applying software prefetch to non-moving garbage collectors that achieves significant performance gains; and 3) a comprehensive analysis of object scanning techniques, cataloguing and comparing the performance of the known methods, and leading to a new technique that optimizes performance without significant cost to the runtime environment. These contributions are applicable to a wide range of garbage collectors, and can provide significant measurable speedups to a design point where each implementer in the past has had to trust intuition or their own benchmarking. The methodologies and implementation techniques contributed in this dissertation have the potential to make a significant improvement to the performance of every garbage collector

    Compiler and Runtime Optimizations for Fine-Grained Distributed Shared Memory Systems

    Get PDF
    Bal, H.E. [Promotor

    Performance analysis methods for understanding scaling bottlenecks in multi-threaded applications

    Get PDF
    In dit proefschrift stellen we drie nieuwe methodes voor om de prestatie van meerdradige programma's te analyseren. Onze eerste methode, criticality stacks, is bruikbaar voor het analyseren van onevenwicht tussen draden. Om deze stacks te construeren stellen we een nieuwe criticaliteitsmetriek voor, die de uitvoeringstijd van een applicatie opsplitst in een deel voor iedere draad. Hoe groter dit deel is voor een draad, hoe kritischer deze draad is voor de applicatie. De tweede methode, bottle graphs, stelt iedere draad van een meerdradig programma voor als een rechthoek in een grafiek. De hoogte van de rechthoek wordt berekend door middel van onze criticaliteitsmetriek, en de breedte stelt het parallellisme van een draad voor. Rechthoeken die bovenaan in de grafiek zitten, als het ware in de hals van de fles, hebben een beperkt parallellisme, waardoor we ze beschouwen als “bottlenecks” voor de applicatie. Onze derde methode, speedup stacks, toont de bereikte speedup van een applicatie en de verschillende componenten die speedup beperken in een gestapelde grafiek. De intuïtie achter dit concept is dat door het reduceren van de invloed van een bepaalde component, de speedup van een applicatie proportioneel toeneemt met de grootte van die component in de stapel

    Effective Compile-Time Analysis for Data Prefetching In Java

    Get PDF
    The memory hierarchy in modern architectures continues to be a major performance bottleneck. Many existing techniques for improving memory performance focus on Fortran and C programs, but memory latency is also a barrier to achieving high performance in object-oriented languages. Existing software techniques are inadequate for exposing optimization opportunities in object-oriented programs. One key problem is the use of high-level programming abstractions which make analysis difficult. Another challenge is that programmers use a variety of data structures, including arrays and linked structures, so optimizations must work on a broad range of programs. We develop a new unified data-flow analysis for identifying accesses to arrays and linked structures called recurrence analysis. Prior approaches that identify these access patterns are ad hoc, or treat arrays and linked structures independently. The data-flow analysis is intra- and inter-procedural, which is important in Java programs that use encapsulation to hide implementation details. We sho

    Client cache management in a distributed object database

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (leaves 137-143).by Mark Stuart Day.Ph.D
    • …
    corecore