4 research outputs found

    HW-SW co-design techniques for modern programming languages

    Get PDF
    Modern programming languages raise the level of abstraction, hide the details of computer systems from programmers, and provide many convenient features. Such strong abstraction from the details of computer systems with runtime support of many convenient features increases the productivity of programmers. Such benefits, however, come with performance overheads. First, many of modern programming languages use a dynamic type system which incurs overheads of profiling program execution and generating specialized codes in the middle of execution. Second, such specialized codes constantly add overheads of dynamic type checks. Third, most of modern programming languages use automatic memory management which incurs memory overheads due to metadata and delayed reclamation as well as execution time overheads due to garbage collection operations. This thesis makes three contributions to address the overheads of modern programming languages. First, it describes the enhancements to the compiler of dynamic scripting languages necessary to enable sharing of compilation results across executions. These compilers have been developed with little consideration for reusing optimization efforts across executions since it is considered difficult due to dynamic nature of the languages. As a first step toward enabling the reuse of compilation results of dynamic scripting languages, it focuses on inline caching (IC) which is one of the fundamental optimization techniques for dynamic type systems. Second, it describes a HW-SW co-design technique to further improve IC operations. While the first proposal focuses on expensive IC miss handling during JavaScript initialization, the second proposal accelerates IC hit operations to improve the overall performance. Lastly, it describes how to exploit common sharing patterns of programs to reduce overheads of reference counting for garbage collection. It minimizes atomic operations in reference counting by biasing each object to a specific thread

    Profilage continu et efficient de verrous pour Java pour les architectures multicœurs

    Get PDF
    Today, the processing of large dataset is generally parallelised and performed on computers with many cores. However, locks can serialize the execution of these cores and hurt the latency and the processing throughput. Spotting theses lock contention issues in-vitro (i.e. during the development phase) is complex because it is difficult to reproduce a production environment, to create a realistic workload representative of the context of use of the software and to test every possible configuration of deployment where will be executed the software. This thesis introduces Free Lunch, a lock profiler that diagnoses phases of high lock contention due to locks in-vivo (i.e. during the operational phase). Free Lunch is designed around a new metric, the Critical Section Pressure (CSP), which aims to evaluate the impact of lock contention on overall thread progress. Free Lunch is integrated in Hotpost in order to minimize the overhead and regularly reports the CSP during the execution in order to detect temporary issues due to locks. Free Lunch is evaluated over 31 benchmarks from Dacapo 9.12, SpecJVM08 and SpecJBB2005, and over the Cassandra database. We were able to pinpoint the phases of lock contention in 6 applications for which some of these were not detected by existing profilers. With this information, we have improved the performance of Xalan by 15% just by rewriting one line of code and identified a phase of high lock contention in Cassandra during the replay of transactions after a crash of a node. Free Lunch has never degraded performance by more than 6%, which makes it suitable to be deployed continuously in an operational environment.Aujourd’hui, le traitement de grands jeux de données est généralement parallélisé et effectué sur des machines multi-cœurs. Cependant, les verrous peuvent sérialiser l'exécution de ces coeurs et dégrader la latence et le débit du traitement. Détecter ces problèmes de contention de verrous in-vitro (i.e. pendant le développement du logiciel) est complexe car il est difficile de reproduire un environnement de production, de créer une charge de travail réaliste représentative du contexte d’utilisation du logiciel et de tester toutes les configurations de déploiement possibles où s'exécutera le logiciel. Cette thèse présente Free Lunch, un profiler permettant d'identifier les phases de contention dues aux verrous in-vivo (i.e. en production). Free Lunch intègre une nouvelle métrique appelée Critical Section Pressure (CSP) évaluant avec précision l'impact de la synchronisation sur le progrès des threads. Free Lunch est directement intégré dans la JVM Hotspot pour minimiser le surcoût d'exécution et reporte régulièrement la CSP afin de pouvoir détecter les problèmes transitoires dus aux verrous. Free Lunch est évalué sur 31 benchmarks issus de Dacapo 9.12, SpecJVM08 et SpecJBB2005, ainsi que sur la base de données Cassandra. Nous avons identifié des phases de contention dans 6 applications dont certaines n'étaient pas détectées par les profilers actuels. Grâce à ces informations, nous avons amélioré la performance de Xalan de 15% en modifiant une seule ligne de code et identifié une phase de haute contention dans Cassandra. Free Lunch n’a jamais dégradé les performances de plus de 6% ce qui le rend approprié pour être déployé continuellement dans un environnement de production

    Profilage continu et efficient de verrous pour Java pour les architectures multicœurs

    Get PDF
    Today, the processing of large dataset is generally parallelised and performed on computers with many cores. However, locks can serialize the execution of these cores and hurt the latency and the processing throughput. Spotting theses lock contention issues in-vitro (i.e. during the development phase) is complex because it is difficult to reproduce a production environment, to create a realistic workload representative of the context of use of the software and to test every possible configuration of deployment where will be executed the software. This thesis introduces Free Lunch, a lock profiler that diagnoses phases of high lock contention due to locks in-vivo (i.e. during the operational phase). Free Lunch is designed around a new metric, the Critical Section Pressure (CSP), which aims to evaluate the impact of lock contention on overall thread progress. Free Lunch is integrated in Hotpost in order to minimize the overhead and regularly reports the CSP during the execution in order to detect temporary issues due to locks. Free Lunch is evaluated over 31 benchmarks from Dacapo 9.12, SpecJVM08 and SpecJBB2005, and over the Cassandra database. We were able to pinpoint the phases of lock contention in 6 applications for which some of these were not detected by existing profilers. With this information, we have improved the performance of Xalan by 15% just by rewriting one line of code and identified a phase of high lock contention in Cassandra during the replay of transactions after a crash of a node. Free Lunch has never degraded performance by more than 6%, which makes it suitable to be deployed continuously in an operational environment.Aujourd’hui, le traitement de grands jeux de données est généralement parallélisé et effectué sur des machines multi-cœurs. Cependant, les verrous peuvent sérialiser l'exécution de ces coeurs et dégrader la latence et le débit du traitement. Détecter ces problèmes de contention de verrous in-vitro (i.e. pendant le développement du logiciel) est complexe car il est difficile de reproduire un environnement de production, de créer une charge de travail réaliste représentative du contexte d’utilisation du logiciel et de tester toutes les configurations de déploiement possibles où s'exécutera le logiciel. Cette thèse présente Free Lunch, un profiler permettant d'identifier les phases de contention dues aux verrous in-vivo (i.e. en production). Free Lunch intègre une nouvelle métrique appelée Critical Section Pressure (CSP) évaluant avec précision l'impact de la synchronisation sur le progrès des threads. Free Lunch est directement intégré dans la JVM Hotspot pour minimiser le surcoût d'exécution et reporte régulièrement la CSP afin de pouvoir détecter les problèmes transitoires dus aux verrous. Free Lunch est évalué sur 31 benchmarks issus de Dacapo 9.12, SpecJVM08 et SpecJBB2005, ainsi que sur la base de données Cassandra. Nous avons identifié des phases de contention dans 6 applications dont certaines n'étaient pas détectées par les profilers actuels. Grâce à ces informations, nous avons amélioré la performance de Xalan de 15% en modifiant une seule ligne de code et identifié une phase de haute contention dans Cassandra. Free Lunch n’a jamais dégradé les performances de plus de 6% ce qui le rend approprié pour être déployé continuellement dans un environnement de production

    Memory abstractions for parallel programming

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 156-163).A memory abstraction is an abstraction layer between the program execution and the memory that provides a different "view" of a memory location depending on the execution context in which the memory access is made. Properly designed memory abstractions help ease the task of parallel programming by mitigating the complexity of synchronization or admitting more efficient use of resources. This dissertation describes five memory abstractions for parallel programming: (i) cactus stacks that interoperate with linear stacks, (ii) efficient reducers, (iii) reducer arrays, (iv) ownershipaware transactions, and (v) location-based memory fences. To demonstrate the utility of memory abstractions, my collaborators and I developed Cilk-M, a dynamically multithreaded concurrency platform which embodies the first three memory abstractions. Many dynamic multithreaded concurrency platforms incorporate cactus stacks to support multiple stack views for all the active children simultaneously. The use of cactus stacks, albeit essential, forces concurrency platforms to trade off between performance, memory consumption, and interoperability with serial code due to its incompatibility with linear stacks. This dissertation proposes a new strategy to build a cactus stack using thread-local memory mapping (or TLMM), which enables Cilk-M to satisfy all three criteria simultaneously. A reducer hyperobject allows different branches of a dynamic multithreaded program to maintain coordinated local views of the same nonlocal variable. With reducers, one can use nonlocal variables in a parallel computation without restructuring the code or introducing races. This dissertation introduces memory-mapped reducers, which admits a much more efficient access compared to existing implementations. When used in large quantity, reducers incur unnecessarily high overhead in execution time and space consumption. This dissertation describes support for reducer arrays, which offers the same functionality as an array of reducers with significantly less overhead. Transactional memory is a high-level synchronization mechanism, designed to be easier to use and more composable than fine-grain locking. This dissertation presents ownership-aware transactions, the first transactional memory design that provides provable safety guarantees for "opennested" transactions. On architectures that implement memory models weaker than sequential consistency, programs communicating via shared memory must employ memory-fences to ensure correct execution. This dissertation examines the concept of location-based memoryfences, which unlike traditional memory fences, incurs latency only when synchronization is necessary.by I-Ting Angelina Lee.Ph.D
    corecore