7 research outputs found

    Dynamic deadlock verification for general barrier synchronisation

    Get PDF
    We present Armus, a dynamic verification tool for deadlock detection and avoidance specialised in barrier synchronisation. Barriers are used to coordinate the execution of groups of tasks, and serve as a building block of parallel computing. Our tool verifies more barrier synchronisation patterns than current state-of-the-art. To improve the scalability of verification, we introduce a novel eventbased representation of concurrency constraints, and a graph-based technique for deadlock analysis. The implementation is distributed and fault-tolerant, and can verify X10 and Java programs. To formalise the notion of barrier deadlock, we introduce a core language expressive enough to represent the three most widespread barrier synchronisation patterns: group, split-phase, and dynamic membership. We propose a graph analysis technique that selects from two alternative graph representations: the Wait-For Graph, that favours programs with more tasks than barriers; and the State Graph, optimised for programs with more barriers than tasks. We prove that finding a deadlock in either representation is equivalent, and that the verification algorithm is sound and complete with respect to the notion of deadlock in our core language. Armus is evaluated with three benchmark suites in local and distributed scenarios. The benchmarks show that graph analysis with automatic graph-representation selection can record a 7-fold execution increase versus the traditional fixed graph representation. The performance measurements for distributed deadlock detection between 64 processes show negligible overheads

    Specification and Verification of Shared-Memory Concurrent Programs

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Efficient optimization of memory accesses in parallel programs

    Get PDF
    The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges. The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models. The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible. Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor. With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures

    Supporting Concurrency Abstractions in High-level Language Virtual Machines

    Get PDF
    During the past decade, software developers widely adopted JVM and CLI as multi-language virtual machines (VMs). At the same time, the multicore revolution burdened developers with increasing complexity. Language implementers devised a wide range of concurrent and parallel programming concepts to address this complexity but struggle to build these concepts on top of common multi-language VMs. Missing support in these VMs leads to tradeoffs between implementation simplicity, correctly implemented language semantics, and performance guarantees. Departing from the traditional distinction between concurrency and parallelism, this dissertation finds that parallel programming concepts benefit from performance-related VM support, while concurrent programming concepts benefit from VM support that guarantees correct semantics in the presence of reflection, mutable state, and interaction with other languages and libraries. Focusing on these concurrent programming concepts, this dissertation finds that a VM needs to provide mechanisms for managed state, managed execution, ownership, and controlled enforcement. Based on these requirements, this dissertation proposes an ownership-based metaobject protocol (OMOP) to build novel multi-language VMs with proper concurrent programming support. This dissertation demonstrates the OMOP's benefits by building concurrent programming concepts such as agents, software transactional memory, actors, active objects, and communicating sequential processes on top of the OMOP. The performance evaluation shows that OMOP-based implementations of concurrent programming concepts can reach performance on par with that of their conventionally implemented counterparts if the OMOP is supported by the VM. To conclude, the OMOP proposed in this dissertation provides a unifying and minimal substrate to support concurrent programming on top of multi-language VMs. The OMOP enables language implementers to correctly implement language semantics, while simultaneously enabling VMs to provide efficient implementations

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Evaluating technologies and techniques for transitioning hydrodynamics applications to future generations of supercomputers

    Get PDF
    Current supercomputer development trends present severe challenges for scientific codebases. Moore’s law continues to hold, however, power constraints have brought an end to Dennard scaling, forcing significant increases in overall concurrency. The performance imbalance between the processor and memory sub-systems is also increasing and architectures are becoming significantly more complex. Scientific computing centres need to harness more computational resources in order to facilitate new scientific insights and maintaining their codebases requires significant investments. Centres therefore have to decide how best to develop their applications to take advantage of future architectures. To prevent vendor "lock-in" and maximise investments, achieving portableperformance across multiple architectures is also a significant concern. Efficiently scaling applications will be essential for achieving improvements in science and the MPI (Message Passing Interface) only model is reaching its scalability limits. Hybrid approaches which utilise shared memory programming models are a promising approach for improving scalability. Additionally PGAS (Partitioned Global Address Space) models have the potential to address productivity and scalability concerns. Furthermore, OpenCL has been developed with the aim of enabling applications to achieve portable-performance across a range of heterogeneous architectures. This research examines approaches for achieving greater levels of performance for hydrodynamics applications on future supercomputer architectures. The development of a Lagrangian-Eulerian hydrodynamics application is presented together with its utility for conducting such research. Strategies for improving application performance, including PGAS- and hybrid-based approaches are evaluated at large node-counts on several state-of-the-art architectures. Techniques to maximise the performance and scalability of OpenMP-based hybrid implementations are presented together with an assessment of how these constructs should be combined with existing approaches. OpenCL is evaluated as an additional technology for implementing a hybrid programming model and improving performance-portability. To enhance productivity several tools for automatically hybridising applications and improving process-to-topology mappings are evaluated. Power constraints are starting to limit supercomputer deployments, potentially necessitating the use of more energy efficient technologies. Advanced processor architectures are therefore evaluated as future candidate technologies, together with several application optimisations which will likely be necessary. An FPGA-based solution is examined, including an analysis of how effectively it can be utilised via a high-level programming model, as an alternative to the specialist approaches which currently limit the applicability of this technology
    corecore