172 research outputs found

    Towards an Adaptive Skeleton Framework for Performance Portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

    Costing JIT Traces

    Get PDF
    Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project aims to investigate whether the information in JIT traces can be used to make better scheduling decisions or perform code transformations to adapt the code for a specific parallel architecture. To achieve this goal, a cost model must be developed to estimate the execution time of an individual trace. This paper presents the design and implementation of a system for extracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We perform a search of the cost model parameter space using genetic algorithms to identify the best weightings for those parameters. We test the accuracy of these cost models for predicting the cost of individual traces on a set of loop-based micro-benchmarks. We also compare the accuracy of the cost models for predicting whole program execution time over the Pycket benchmark suite. Our results show that the weighted cost model using the weightings found from the genetic algorithm search has the best accuracy

    JIT costing adaptive skeletons for performance portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregular parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregular parallel programs. The approach combines JIT compiler technology with dynamic scheduling and dynamic transformation of declarative parallelism. We specify families of algorithmic skeletons plus equations for rewriting skeleton expressions. We present the design of a framework that unfolds skeletons into task graphs, dynamically schedules tasks, and dynamically rewrites skeletons, guided by a lightweight JIT trace-based cost model, to adapt the number and granularity of tasks for the architecture. We outline the system architecture and prototype implementation in Racket/Pycket. As the current prototype does not yet automatically perform dynamic rewriting we present results based on manual offline rewriting, demonstrating that (i) the system scales to hundreds of cores given enough parallelism of suitable granularity, and (ii) the JIT trace cost model predicts granularity accurately enough to guide rewriting towards a good adaptive transformation

    Collection Skeletons - Declarative Abstractions for Data Collections

    Get PDF

    JIT-based cost models for adaptive parallelism

    Get PDF
    Parallel programming is extremely challenging. Worse yet, parallel architectures evolve quickly, and parallel programs must often be refactored for each new architecture. It is highly desirable to provide performance portability, so programs developed on one architecture can deliver good performance on other architectures. This thesis is part of the AJITPar project that investigates a novel approach for achieving performance portability by the development of suitable cost models to inform scheduling decisions with dynamic information about computational and communication costs on the target architecture. The main artifact of the AJITPar project is the Adaptive Skeleton Library (ASL) that pro- vides a distributed-memory master-worker implementation of a set of Algorithmic Skeletons i.e. programming patterns that abstract away the low-level intricacies of parallelism. After JIT warm-up, ASL uses a computational cost model applied to JIT trace information from the Pycket compiler, a tracing JIT implementation of the Racket language, to transform the skeletons. The execution time of an ASL task is primarily determined by computation and communication costs. The Pycket compiler is extended to enable runtime access to JIT traces, both the sequences of instructions and frequency of execution. Crucially for dynamic, adaption these are obtained with minimal overhead. A low cost, dynamic computation cost model for estimating the runtime of JIT compiled Pycket programs, Γ, is developed and validated. This is believed to be the first such model. The design explores the challenges of estimating execution time from JIT trace instructions and presents three increasingly sophisticated cost models. The cost model predicts execution time based on the PyPy JIT instructions present in compiled JIT traces. The final abstract cost model applies weightings for 5 different classes of trace instructions and also proposes a method for aggregating the cost models for single traces into a cost model for an entire program. Execution time is measured, and traces generated are recorded, from a suite of 41 benchmarks. Linear regression is used to determine the weightings for the abstract cost model from this data. The final cost model reveals that allocation operations count most for execution time, followed by guards and numeric operations. The suitability of Γ for predicting the effect of ASL program transformations is investigated. The real utility of Γ is not in absolute predictions of execution times for different programs, but in predicting the effects of applying program transformations on parallel programs. A linear relationship between the actual computational cost for a task, and that predicted by Γ for five benchmarks on two architectures is demonstrated. A series of increasingly accurate low cost, dynamic cost models for estimating the communi- cation costs of ASL programs, K, are developed and validated. Predicting the optimum task size in ASL not only relies on computational cost predictions, but also predictions of the over- head of communicating tasks to worker nodes and results back to the master. The design and iterative development of a cost model which predicts the serialisation, deserialisation, and network send times of spawning a task in ASL is presented. Linear regression of communication timings are used to determine the appropriate weighting parameters for each. K is shown to be valid for predicting other, arbitrary data structures by demonstrating an additive property of the model. The model K is validated by showing a linear relationship between the combined predicted costs of the simple types in instances of aggregated data structures, and measured communication time. This validation is performed on five benchmarks on two platforms. Finally, a low cost dynamic cost model, T , that predicts a good ASL task size by combining information from the computation and communication cost models (Γand K) is developed and validated. The key insight in the design in this model is to balance the communications cost on the master node with the computational and communications cost on the worker nodes. The predictive power of T is tested model using six benchmarks, and it is shown to more accurately predict the optimal task size, reducing total program runtimes when compared with the default ASL prototype

    JIT-Based cost analysis for dynamic program transformations

    Get PDF
    Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project investigates whether the information in JIT traces can be used to dynamically transform programs for a specific parallel architecture. Hence a lightweight cost model is required for JIT traces. This paper presents the design and implementation of a system for extracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We determine the best weights for the cost model parameters using linear regression. We evaluate the effectiveness of the cost models for predicting the relative costs of transformed programs

    Exact Analytic Solution for the Rotation of a Rigid Body having Spherical Ellipsoid of Inertia and Subjected to a Constant Torque

    Get PDF
    The exact analytic solution is introduced for the rotational motion of a rigid body having three equal principal moments of inertia and subjected to an external torque vector which is constant for an observer fixed with the body, and to arbitrary initial angular velocity. In the paper a parametrization of the rotation by three complex numbers is used. In particular, the rows of the rotation matrix are seen as elements of the unit sphere and projected, by stereographic projection, onto points on the complex plane. In this representation, the kinematic differential equation reduces to an equation of Riccati type, which is solved through appropriate choices of substitutions, thereby yielding an analytic solution in terms of confluent hypergeometric functions. The rotation matrix is recovered from the three complex rotation variables by inverse stereographic map. The results of a numerical experiment confirming the exactness of the analytic solution are reported. The newly found analytic solution is valid for any motion time length and rotation amplitude. The present paper adds a further element to the small set of special cases for which an exact solution of the rotational motion of a rigid body exists.Comment: "Errata Corridge Postprint" In particular: typos present in Eq. 28 of the Journal version are HERE correcte

    Exact Analytic Solutions for the Rotation of an Axially Symmetric Rigid Body Subjected to a Constant Torque

    Get PDF
    New exact analytic solutions are introduced for the rotational motion of a rigid body having two equal principal moments of inertia and subjected to an external torque which is constant in magnitude. In particular, the solutions are obtained for the following cases: (1) Torque parallel to the symmetry axis and arbitrary initial angular velocity; (2) Torque perpendicular to the symmetry axis and such that the torque is rotating at a constant rate about the symmetry axis, and arbitrary initial angular velocity; (3) Torque and initial angular velocity perpendicular to the symmetry axis, with the torque being fixed with the body. In addition to the solutions for these three forced cases, an original solution is introduced for the case of torque-free motion, which is simpler than the classical solution as regards its derivation and uses the rotation matrix in order to describe the body orientation. This paper builds upon the recently discovered exact solution for the motion of a rigid body with a spherical ellipsoid of inertia. In particular, by following Hestenes' theory, the rotational motion of an axially symmetric rigid body is seen at any instant in time as the combination of the motion of a "virtual" spherical body with respect to the inertial frame and the motion of the axially symmetric body with respect to this "virtual" body. The kinematic solutions are presented in terms of the rotation matrix. The newly found exact analytic solutions are valid for any motion time length and rotation amplitude. The present paper adds further elements to the small set of special cases for which an exact solution of the rotational motion of a rigid body exists.Comment: "Errata Corridge Postprint" version of the journal paper. The following typos present in the Journal version are HERE corrected: 1) Definition of \beta, before Eq. 18; 2) sign in the statement of Theorem 3; 3) Sign in Eq. 53; 4)Item r_0 in Eq. 58; 5) Item R_{SN}(0) in Eq. 6

    Quantum Knitting

    Get PDF
    We analyze the connections between the mathematical theory of knots and quantum physics by addressing a number of algorithmic questions related to both knots and braid groups. Knots can be distinguished by means of `knot invariants', among which the Jones polynomial plays a prominent role, since it can be associated with observables in topological quantum field theory. Although the problem of computing the Jones polynomial is intractable in the framework of classical complexity theory, it has been recently recognized that a quantum computer is capable of approximating it in an efficient way. The quantum algorithms discussed here represent a breakthrough for quantum computation, since approximating the Jones polynomial is actually a `universal problem', namely the hardest problem that a quantum computer can efficiently handle.Comment: 29 pages, 5 figures; to appear in Laser Journa
    corecore