2,917 research outputs found
Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code
The current trend in next-generation exascale systems goes towards
integrating a wide range of specialized (co-)processors into traditional
supercomputers. However, the integration of different specialized devices
increases the degree of heterogeneity and the complexity in programming such
type of systems. Due to the efficiency of heterogeneous systems in terms of
Watt and FLOPS per surface unit, opening the access of heterogeneous platforms
to a wider range of users is an important problem to be tackled. In order to
bridge the gap between heterogeneous systems and programmers, in this paper we
propose a machine learning-based approach to learn heuristics for defining
transformation strategies of a program transformation system. Our approach
proposes a novel combination of reinforcement learning and classification
methods to efficiently tackle the problems inherent to this type of systems.
Preliminary results demonstrate the suitability of the approach for easing the
programmability of heterogeneous systems.Comment: Part of the Program Transformation for Programmability in
Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March
2016, 9 pages, LaTe
Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain
Batch solution of small PDEs with the OPS DSL
In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs
Relay: A New IR for Machine Learning Frameworks
Machine learning powers diverse services in industry including search,
translation, recommendation systems, and security. The scale and importance of
these models require that they be efficient, expressive, and portable across an
array of heterogeneous hardware devices. These constraints are often at odds;
in order to better accommodate them we propose a new high-level intermediate
representation (IR) called Relay. Relay is being designed as a
purely-functional, statically-typed language with the goal of balancing
efficient compilation, expressiveness, and portability. We discuss the goals of
Relay and highlight its important design constraints. Our prototype is part of
the open source NNVM compiler framework, which powers Amazon's deep learning
framework MxNet
- …