Search CORE

6,213 research outputs found

A Modular Approach to Performance, Portability and Productivity for 3D Wave Models

Author: Bilbao Stefan
Dubach Christophe
Gray Alan
Steuwer Michel
Stoltzfus Larisa
Publication venue
Publication date: 01/01/2017
Field of study

No abstract available

Edinburgh Research Explorer

Enlighten

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift

Author: Dubach Christophe
Gorlatch Sergei
Hagedorn Bastian
Steuwer Michel
Stoltzfus Larisa
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2019
Field of study

Stencil computations are a widely used type of algorithm, found in applications from physical simulations to machine learning. Stencils are embarrassingly parallel, therefore fit on modern hardware such as Graphic Processing Units perfectly. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain-specific Languages (DSLs) have raised the programming abstraction and offer good performance; however, this method places the burden on DSL implementers to write almost full-fledged parallelizing compilers and optimizers. Lift has recently emerged as a promising approach to achieve performance portability by using a small set of reusable parallel primitives that DSL or library writers utilize. Lift’s key novelty is in its encoding of optimizations as a system of extensible rewrite rules which are used to explore the optimization space. This article demonstrates how complex multi-dimensional stencil code and optimizations are expressed using compositions of simple 1D Lift primitives and rewrite rules. We introduce two optimizations that provide high performance for stencils in particular: classical overlapped tiling for multi-dimensional stencils and 2.5D tiling specifically for 3D stencils. We provide an in-depth analysis on how the tiling optimizations affects stencils of different shapes and sizes across different applications. Our experimental results show that our approach outperforms existing compiler approaches and hand-tuned codes

Edinburgh Research Explorer

Enlighten

Using the High Productivity Language Chapel to Target GPGPU Architectures

Author: Chamberlain Bradford L.
Garzaran Maria J.
Padua David
Sidelnik Albert
Publication venue
Publication date: 25/04/2011
Field of study

It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPGPU architectures using the high productivity programming language Chapel. Rather than resorting to different parallel libraries or annotations for a given parallel platform, we leverage a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of being portable across distinct classes of parallel architectures, including desktop multicores, distributed memory clusters, large-scale shared memory, and now CPU-GPU hybrids. We present experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA.NSF CCF 0702260Cray Inc. Cray-SRA-2010-016962010-2011 Nvidia Research Fellowshipunpublishednot peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

High Performance Stencil Code Generation with LIFT

Author: Dubach Christophe
Gorlatch Sergei
Hagedorn Bastian
Steuwer Michel
Stoltzfus Larisa
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Stencil computations are widely used from physical simulations to machine-learning. They are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing Units. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance. However, this places the burden on DSL implementers who have to write almost full-fledged parallelizing compilers and optimizers. Lift has recently emerged as a promising approach to achieve performance portability and is based on a small set of reusable parallel primitives that DSL or library writers can build upon. Lift’s key novelty is in its encoding of optimizations as a system of extensible rewrite rules which are used to explore the optimization space. However, Lift has mostly focused on linear algebra operations and it remains to be seen whether this approach is applicable for other domains. This paper demonstrates how complex multidimensional stencil code and optimizations such as tiling are expressible using compositions of simple 1D Lift primitives. By leveraging existing Lift primitives and optimizations, we only require the addition of two primitives and one rewrite rule to do so. Our results show that this approach outperforms existing compiler approaches and hand-tuned codes

Edinburgh Research Explorer

Enlighten

PYDAC: A DISTRIBUTED RUNTIME SYSTEM AND PROGRAMMING MODEL FOR A HETEROGENEOUS MANY-CORE ARCHITECTURE

Author: Huang Bin
NC DOCKS at The University of North Carolina at Charlotte
Publication venue
Publication date: 01/01/2014
Field of study

Heterogeneous many-core architectures that consist of big, fast cores and small, energy-efficient cores are very promising for future high-performance computing (HPC) systems. These architectures offer a good balance between single-threaded perfor- mance and multithreaded throughput. Such systems impose challenges on the design of programming model and runtime system. Specifically, these challenges include (a) how to fully utilize the chip’s performance, (b) how to manage heterogeneous, un- reliable hardware resources, and (c) how to generate and manage a large amount of parallel tasks. This dissertation proposes and evaluates a Python-based programming framework called PyDac. PyDac supports a two-level programming model. At the high level, a programmer creates a very large number of tasks, using the divide-and-conquer strategy. At the low level, tasks are written in imperative programming style. The runtime system seamlessly manages the parallel tasks, system resilience, and inter- task communication with architecture support. PyDac has been implemented on both an field-programmable gate array (FPGA) emulation of an unconventional het- erogeneous architecture and a conventional multicore microprocessor. To evaluate the performance, resilience, and programmability of the proposed system, several micro-benchmarks were developed. We found that (a) the PyDac abstracts away task communication and achieves programmability, (b) the micro-benchmarks are scalable on the hardware prototype, but (predictably) serial operation limits some micro-benchmarks, and (c) the degree of protection versus speed could be varied in redundant threading that is transparent to programmers

The University of North Carolina at Greensboro

Safe programming Languages for ABB Automation System 800xA

Author: Borg Markus
Publication venue: Lunds universitet/Institutionen för reglerteknik
Publication date: 01/01/2007
Field of study

More than 90 % of all computers are embedded in different types of systems, for example mobile phones and industrial robots. Some of these systems are real-time systems; they have to produce their output within certain time constraints. They can also be safety critical; if something goes wrong, there is a risk that a great deal of damage is caused. Industrial Extended Automation System 800xA, developed by ABB, is a realtime control system intended for industrial use within a wide variety of applications where a certain focus on safety is required, for example power plants and oil platforms. The software is currently written in C and C++, languages that are not optimal from a safety point of view. In this master's thesis, it is investigated whether there are any plausible alternatives to using C/C++ for safety critical real-time systems. A number of requirements that programming languages used in this area have to fulfill are stated and it is evaluated if some candidate languages fulfill these requirements. The candidate languages, Java and Ada, are compared to C and C++. It is determined that the Java-to-C compiler LJRT (Lund Java-based Real Time) is a suitable alternative. The practical part of this thesis is concerned with the introduction of Java in 800xA. A module of the system is ported to Java and executed together with the original C/C++ solution. The functionality of the system is tested using a formal test suite and the performance and memory footprint of our solution is measured. The results show that it is possible to gradually introduce Java in 800xA using LJRT, which is the main contribution of this thesis

Leveraging Semantics Attached to Function Calls to Isolate Applications from Hardware

Author: Cohen Albert
Halle Sean
Publication venue: HAL CCSD
Publication date: 01/06/2010
Field of study

International audienceTo improve performance, computer systems are forcing more microarchitectural and parallel hardware details to be directly exploited by application programmers, exposing limitations in existing compiler and OS infras- tructure, which is failing to maintain the software productivity of the past. In this paper we propose a prag- matic approach, motivated by our experience with BLIS [11], for building applications that tolerate changing hardware, delivering good performance from the same source across diverse parallel targets. Applications are coded in terms of generic parallel patterns using a “piggy back” language, embedded into a base sequential lan- uage by attaching semantics to function calls. Our approach allows programmers to leverage multiple pro- cessor-specific and domain-specific toolchains encapsulated in specialization modules, which extract their input information from the semantics of the function calls, creating an isolation layer bewteen application and target platforms. Developers use existing sequential development tools and languages to code and debug, performing specialization as a separate step when shipping the code. We show this approach can successfully specialize a single source to diverse and evolving heterogeneous multi-core targets and enable aggressive compiler optimiza- tions

INRIA a CCSD electronic archive server