4 research outputs found

    Costing JIT Traces

    Get PDF
    Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project aims to investigate whether the information in JIT traces can be used to make better scheduling decisions or perform code transformations to adapt the code for a specific parallel architecture. To achieve this goal, a cost model must be developed to estimate the execution time of an individual trace. This paper presents the design and implementation of a system for extracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We perform a search of the cost model parameter space using genetic algorithms to identify the best weightings for those parameters. We test the accuracy of these cost models for predicting the cost of individual traces on a set of loop-based micro-benchmarks. We also compare the accuracy of the cost models for predicting whole program execution time over the Pycket benchmark suite. Our results show that the weighted cost model using the weightings found from the genetic algorithm search has the best accuracy

    Towards an Adaptive Skeleton Framework for Performance Portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

    In search of a map : using program slicing to discover potential parallelism in recursive functions

    Get PDF
    Funding: EU FP7 grant “Parallel Patterns for Adaptive Heterogeneous Multicore Systems” (ICT-288570), by the EU H2020 grant “RePhrase: Refactoring Parallel Het- erogeneous Resource-Aware Applications – a Software Engineering Approach” (ICT-644235), by COST Action IC1202 (“Timing Analysis on Code-Level”), by the EPSRC grant “Discovery: Pattern Discovery and Program Shaping for Manycore Systems” (EP/P020631/1), and by Scottish Enterprise grant PS7305CA44.Recursion schemes, such as the well-known map, can be used as loci of potential parallelism, where schemes are replaced with an equivalent parallel implementation. This paper formalises a novel technique, using program slicing, that automatically and statically identifies computations in recursive functions that can be lifted out of the function and then potentially performed in parallel. We define a new program slicing algorithm, build a prototype implementation, and demonstrate its use on 12 Haskell examples, including benchmarks from the NoFib suite and functions from the standard Haskell Prelude. In all cases, we obtain the expected results in terms of finding potential parallelism. Moreover, we have tested our prototype against synthetic benchmarks, and found that our prototype has quadratic time complexity. For the NoFib benchmark examples we demonstrate that relative parallel speedups can be obtained (up to 32.93x the sequential performance on 56 hyperthreaded cores).Postprin

    A parallel SML compiler based on algorithmic skeletons

    No full text
    corecore