Search CORE

43 research outputs found

Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs

Author: Nabi Syed Waqar
Urlea Cristian
Vanderbauwhede Wim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/04/2018
Field of study

In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain

Crossref

Enlighten

A Skeleton Based Programming Paradigm for Mobile Multi-Agents on Distributed Systems and Its Realization within the MAGDA Mobile Agents Platform

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2008
Field of study

Crossref

FastFlow: Efficient Parallel Streaming Applications on Multi-core

Author: Aldinucci Marco
Meneghin Massimiliano
Torquati Massimo
Publication venue
Publication date: 02/09/2009
Field of study

Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers with optimising tools and programming frameworks is a nowadays challenge. Few efforts have been done to support effective streaming applications on these architectures. In this paper we introduce FastFlow, a low-level programming framework based on lock-free queues explicitly designed to support high-level languages for streaming applications. We compare FastFlow with state-of-the-art programming frameworks such as Cilk, OpenMP, and Intel TBB. We experimentally demonstrate that FastFlow is always more efficient than all of them in a set of micro-benchmarks and on a real world application; the speedup edge of FastFlow over other solutions might be bold for fine grain tasks, as an example +35% on OpenMP, +226% on Cilk, +96% on TBB for the alignment of protein P01111 against UniProt DB using Smith-Waterman algorithm.Comment: 23 pages + cove

arXiv.org e-Print Archive

UnipiEprints

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

Author: Cole M. I.
Darlington J.
Harris M.
Holk E.
Jones S. P.
Karrenberg R.
Püschel M.
Svensson J.
Thies W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Survey on Thread-Level Speculation Techniques

Author: Estébanez López Álvaro
González Escribano Arturo
Llanos Ferraris Diego Rafael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Producción CientíficaThread-Level Speculation (TLS) is a promising technique that allows the parallel execution of sequential code without relying on a prior, compile-time-dependence analysis. In this work, we introduce the technique, present a taxonomy of TLS solutions, and summarize and put into perspective the most relevant advances in this field.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

Repositorio Documental de la Universidad de Valladolid

Generating Performance Portable Code using Rewrite Rules: From High-Level Functional Expressions to High-Performance OpenCL Code

Author: Cole M. I.
Darlington J.
Harris M.
Holk E.
Jones S. P.
Karrenberg R.
Püschel M.
Svensson J.
Thies W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/08/2015
Field of study

Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension between performance and code portability. Typically, code is either tuned in a low-level imperative language using hardware-specific optimizations to achieve maximum performance or is written in a high-level, possibly functional, language to achieve portability at the expense of performance. We propose a novel approach aiming to combine high-level programming, code portability, and high-performance. Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation, close to the OpenCL programming model, from which OpenCL code is generated. Our rewrite rules define a space of possible implementations which we automatically explore to generate hardware-specific OpenCL implementations. We formalize our system with a core dependently-typed λ-calculus along with a denotational semantics which we use to prove the correctness of the rewrite rules. We test our design in practice by implementing a compiler which generates high performance imperative OpenCL code. Our experiments show that we can automatically derive hardware-specific implementations from simple functional high-level algorithmic expressions offering performance on a par with highly tuned code for multicore CPUs and GPUs written by experts

CiteSeerX

Heriot Watt Pure

Crossref

Edinburgh Research Explorer

Enlighten

Multi-GPU support on the marrow algorithmic skeleton framework

Author: Alexandre Fernando Jorge Marques
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2013
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaWith the proliferation of general purpose GPUs, workload parallelization and datatransfer optimization became an increasing concern. The natural evolution from using a single GPU, is multiplying the amount of available processors, presenting new challenges, as tuning the workload decompositions and load balancing, when dealing with heterogeneous systems. Higher-level programming is a very important asset in a multi-GPU environment, due to the complexity inherent to the currently used GPGPU APIs (OpenCL and CUDA), because of their low-level and code overhead. This can be obtained by introducing an abstraction layer, which has the advantage of enabling implicit optimizations and orchestrations such as transparent load balancing mechanism and reduced explicit code overhead. Algorithmic Skeletons, previously used in cluster environments, have recently been adapted to the GPGPU context. Skeletons abstract most sources of code overhead, by defining computation patterns of commonly used algorithms. The Marrow algorithmic skeleton library is one of these, taking advantage of the abstractions to automate the orchestration needed for an efficient GPU execution. This thesis proposes the extension of Marrow to leverage the use of algorithmic skeletons in the modular and efficient programming of multiple heterogeneous GPUs, within a single machine. We were able to achieve a good balance between simplicity of the programming model and performance, obtaining good scalability when using multiple GPUs, with an efficient load distribution, although at the price of some overhead when using a single-GPU.projects PTDC/EIA-EIA/102579/2008 and PTDC/EIA-EIA/111518/200

Repositório da Universidade Nova de Lisboa

A NOVEL SOFTWARE-BUILT PARALLEL MACHINES AND THEIR INTERCONNECTIONS

Author: AJIT SINGH
Alexander Christopher
Browne J. C.
Cole Mure
Darlington J.
Dehne F.
DHRUBAJYOTI GOSWAMI
Gamma Erich
Grama Ananth
HON FUNG LI
Johnson Ralph E.
MOHAMMAD MURSALIN AKON
Quinn Michael J.
Richard Stevens W.
XUEMIN (SHERMAN) SHEN
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Adaptive structured parallelism

Author: González Vélez Horacio
Publication venue: The University of Edinburgh
Publication date: 01/01/2008
Field of study

Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way in which structured sequential programs are developed, using well-defined constructs. Skeletons provide top-down design composition and control inheritance throughout the program structure. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique which allows the conceptual description of parallel programs whilst fostering platform independence and algorithm abstraction. By decoupling the algorithm specification from machine-dependent structural considerations, structured parallelism allows programmers to code programs regardless of how the computation and communications will be executed in the system platform.Meanwhile, large non-dedicated multiprocessing systems have long posed a challenge to known distributed systems programming techniques as a result of the inherent heterogeneity and dynamism of their resources. Scant research has been devoted to the use of structural information provided by skeletons in adaptively improving program performance, based on resource utilisation. This thesis presents a methodology to improve skeletal parallel programming in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed ASPara: Adaptive Structured Parallelism. ASPara is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at execution

Edinburgh Research Archive