Search CORE

130 research outputs found

Self-adaptive OmpSs tasks in heterogeneous environments

Author: Ayguadé Parra Eduard
Badia Sala Rosa Maria
Labarta Mancho Jesús José
Planas Carbonell Judit
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

As new heterogeneous systems and hardware accelerators appear, high performance computers can reach a higher level of computational power. Nevertheless, this does not come for free: the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource management. OmpSs is a task-based programming model and framework focused on the runtime exploitation of parallelism from annotated sequential applications. This paper presents a set of extensions to this framework: we show how the application programmer can expose different specialized versions of tasks (i.e. pieces of specific code targeted and optimized for a particular architecture) and how the system can choose between these versions at runtime to obtain the best performance achievable for the given application. From the results obtained in a multi-GPU system, we prove that our proposal gives flexibility to application's source code and can potentially increase application's performance.This work has been supported by the European Commission through the ENCORE project (FP7-248647), the TERAFLUX project (FP7-249013), the TEXT project (FP7-261580), the HiPEAC-3 Network of Excellence (FP7-ICT 287759), the Intel-BSC Exascale Lab collaboration project, the support of the Spanish Ministry of Education (CSD2007- 00050 and FPU program), the projects of Computación de Altas Prestaciones V and VI (TIN2007-60625, TIN2012-34557) and the Generalitat de Catalunya (2009-SGR-980).Peer ReviewedPostprint (author’s final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems

Author: Ayguadé Parra Eduard
Beivide Palacio Ramon
Bosque Jose L.
Martorell Bofill Xavier
Mateo Sergi
Pérez Borja
Stafford Esteban
Teruel Xavier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the programming of these systems. However, it does not support the co-execution of a single OpenCL kernel instance on several compute devices. To overcome this limitation, this paper presents an extension of the OmpSs framework that solves two main objectives: the automatic division of datasets among several devices and the management of their memory address spaces. To adapt to different kinds of applications, the data division can be performed by the novel HGuided load balancing algorithm or by the well known Static and Dynamic. All this is accomplished with negligible impact on the programming. Experimental results reveal that there is always one load balancing algorithm that improves the performance and energy consumption of the system.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016- 76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493). The European Research Council under grant agreement No 321253 European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc Projects, grant agreement n 288777, 610402 and 671697 and the European HiPEAC Network.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

Author: Alvarez Carlos
Bautista Leonardo
Becker Tobias
Billung-Meyer Gunnar
Carpenter Paul
Christmann Wolfgang
Cristal Adrian
De La Cruz Raul
Dubhashi Devdatt
Etsion Yoav
Felber Pascal
Fetzer Christof
Gaydadjiev Georgi
Göttel Christian
Hadar Elad
Hagemeyer Jens
Jimenez Daniel
Jungeblut Thorsten
Kaiser Martin
Klawonn Frank
Krupop Stefan
Kucza Nils
Madonar Sergi
Martorell Xavier
Mihklafi Amani
Mudge Trevor
Mudge Trevor
Pasin Marcelo
Pericàs Miquel
Pnevmatikatos Dionisios N.
Porrmann Mario
Port Oron
Rocha Isabelly
Salami Behzad
Salomonsson Hans
Schiavoni Valerio
Trancoso Pedro
Unsal Osman S.
vor dem Berge Micha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Chalmers Research

Publications at Bielefeld University

Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

Author: Ayguadé E.
Beivide Palacio Ramón
Bosque Orero José Luis
Martorell X.
Mateo S.
Pérez Pavón Borja
Stafford Fernández Esteban
Teruel X.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016-76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UCrea

UPCommons. Portal del coneixement obert de la UPC

A taxonomy of task-based parallel programming technologies for high-performance computing

Author: Aguilar Xavier
Dichev Kiril
Fahringer Thomas
Gschwandtner Philipp
Hasanov Khalid
Heller Thomas
Iakymchuk Roman
Jordan Herbert
Katrinis Kostas
Laure Erwin
Lemarinier Pierre
Markidis Stefano
Nikolopoulos Dimitrios
Thoman Peter
Publication venue
Publication date: 19/12/2017
Field of study

Task-based programming models for shared memory -- such as Cilk Plus and OpenMP 3 -- are well established and documented. However, with the increase in parallel, many-core and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today

Queen's University Belfast Research Portal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques

Author: Beltran Querol Vicenç
Clauss Carsten
Moschny Thomas
Peña Antonio J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

In this paper we describe the design of fault tolerance capabilities for general-purpose offload semantics, based on the OmpSs programming model. Using ParaStation MPI, a production MPI-3.1 implementation, we explore the features that, being standard compliant, an MPI stack must support to provide the necessary fault tolerance guarantees, based on MPI's dynamic process management. Our results, including synthetic benchmarks and applications, reveal low runtime overhead and efficient recovery, demonstrating that the existing MPI standard provided us with sufficient mechanisms to implement an effective and efficient fault-tolerant solution.This research received funding from the European Community’s 7th Framework Programme via the DEEP-ER project under Grant Agreement no. 610476. This work has also been supported by the Spanish Ministry of Science and Innovation (contract TIN2012-34557) and by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266. The authors thank Jorge Bell´on, from BSC, for his technical support with the Nanos++ internals.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

AMA: asynchronous management of accelerators for task-based programming models

Author: Ayguadé Parra Eduard
Badia Sala Rosa Maria
Labarta Mancho Jesús José
Planas Judit
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set of optimization techniques to efficiently manage multi-accelerator systems. AMA maximizes the overlap of computation and communication in a blocking-free way. Then, we can use such spare time to do other work while waiting for device operations. Implemented on top of a task-based framework, the experimental evaluation of AMA on a quad-GPU node shows that we reach the performance of a hand-tuned native CUDA code, with the advantage of fully hiding the device management. In addition, we obtain up to more than 2x performance speed-up with respect to the original framework implementation.Peer ReviewedPostprint (published version

Elsevier - Publisher Connector

UPCommons. Portal del coneixement obert de la UPC

Towards an Energy-Aware Framework for Application Development and Execution in Heterogeneous Parallel Architectures

Author: Aguilà A
Badia RM
De Landtsheer R
Deprez JC
Djemame K
Ejarque J
García Pérez D
Georgiou Y
Guedria L
Kavanagh R
Kelefouras V
Pezuela C
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/08/2018
Field of study

The Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (TANGO) project’s goal is to characterise factors which affect power consumption in software development and operation for Heterogeneous Parallel Hardware (HPA) environments. Its main contribution is the combination of requirements engineering and design modelling for self-adaptive software systems, with power consumption awareness in relation to these environments. The energy efficiency and application quality factors are integrated into the application lifecycle (design, implementation and operation). To support this, the key novelty of the project is a reference architecture and its implementation. Moreover, a programming model with built-in support for various hardware architectures including heterogeneous clusters, heterogeneous chips and programmable logic devices is provided. This leads to a new cross-layer programming approach for heterogeneous parallel hardware architectures featuring software and hardware modelling. Application power consumption and performance, data location and time-criticality optimization, as well as security and dependability requirements on the target hardware architecture are supported by the architecture

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

White Rose Research Online

TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation

Author: Armstrong Django
Badia Rosa
Deprez Jean-Christophe
Djemame Karim
Ejarque Jorge
Ferrer Ana Juan
Georgiou Yiannis
Kavanagh Richard
Perez David Garcia
Sirvent Raul
Publication venue
Publication date: 04/03/2016
Field of study

The paper is concerned with the issue of how software systems actually use Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power consumption on these resources. It argues the need for novel methods and tools to support software developers aiming to optimise power consumption resulting from designing, developing, deploying and running software on HPAs, while maintaining other quality aspects of software to adequate and agreed levels. To do so, a reference architecture to support energy efficiency at application construction, deployment, and operation is discussed, as well as its implementation and evaluation plans.Comment: Part of the Program Transformation for Programmability in Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March 2016, 7 pages, LaTeX, 3 PNG figure

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC