Search CORE

4 research outputs found

SOCL: An OpenCL Implementation with Automatic Multi-Device Adaptation Support

Author: Barthou Denis
Counilh Marie-Christine
Denis Alexandre
Henry Sylvain
Namyst Raymond
Publication venue: HAL CCSD
Publication date: 22/08/2013
Field of study

To fully tap into the potential of today's heterogeneous machines, offloading parts of an application on accelerators is not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. In this report we present SOCL, an OpenCL implementation that improves and simplifies the programming experience on heterogeneous architectures. SOCL enables applications to dynamically dispatch computation kernels over processing devices so as to maximize their utilization. OpenCL applications can incrementally make use of light extensions to automatically schedule kernels in a controlled manner on multi-device architectures. A preliminary automatic granularity adaptation extension is also provided. We demonstrate the relevance of our approach by experimenting with several OpenCL applications on a range of representative heterogeneous architectures. We show that performance portability is enhanced by using SOCL extensions.Pour exploiter au mieux les architectures hétérogènes actuelles, il n'est pas suffisant de déléguer aux accélérateurs seulement quelques portions de codes bien déterminées. Le véritable défi consiste à délivrer des applications qui exploitent de façon continue la totalité de l'architecture, c'est-à-dire dont l'ensemble des tâches parallèles les composant sont dynamiquement ordonnancées sur les unités d'exécution disponibles. Dans ce document, nous présentons SOCL, une implémentation de la spécification OpenCL étendue de sorte qu'elle soit plus simple d'utilisation et plus efficace sur les architectures hétérogènes. Cette implémentation peut ordonnancer automatiquement les noyaux de calcul sur les accélérateurs disponibles de façon à maximiser leur utilisation. Les applications utilisant déjà OpenCL peuvent être migrées de façon incrémentale et contrôlée vers SOCL car les extensions fournies sont non intrusives et requièrent très peu de modifications dans les codes. En plus de l'ordonnancement automatique de noyaux de calcul, une extension préliminaire permettant l'adaptation automatique de la granularité est mise à disposition. Nous démontrons la pertinence de cette approche et l'efficacité des extensions fournies à travers plusieurs expérimentations sur diverses architectures hétérogènes représentatives

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Resource aggregation for task-based Cholesky Factorization on top of modern architectures

Author: Cojean Terry
Guermouche Abdou
Hugo Andra
Namyst Raymond
Wacrenier Pierre-André
Publication venue: HAL CCSD
Publication date: 30/11/2016
Field of study

This paper is submitted for review to the Parallel Computing special issue for HCW and HeteroPar 16 workshopsHybrid computing platforms are now commonplace, featuring a large number of CPU cores and accelerators. This trend makes balancing computations between these heterogeneous resources performance critical. In this paper we propose ag-gregating several CPU cores in order to execute larger parallel tasks and improve load balancing between CPUs and accelerators. Additionally, we present our approach to exploit internal parallelism within tasks, by combining two runtime system schedulers: a global runtime system to schedule the main task graph and a local one one to cope with internal task parallelism. We demonstrate the relevance of our approach in the context of the dense Cholesky factorization kernel implemented on top of the StarPU task-based runtime system. We present experimental results showing that our solution outperforms state of the art implementations on two architectures: a modern heterogeneous machine and the Intel Xeon Phi Knights Landing

INRIA a CCSD electronic archive server

Scheduling Dynamic Graphs

Author: Andreas Jakoby
Maciej Liskiewicz
Rüdiger Reischuk
Publication venue: Springer-Verlag
Publication date: 01/01/1999
Field of study

In parallel and distributed computing scheduling low level tasks on the available hardware is a fundamental problem. Traditionally, one has assumed that the set of tasks to be executed is known beforehand. Then the scheduling constraints are given by a precedence graph. Nodes represent the elementary tasks and edges the dependencies among tasks. This static approach is not appropriate in situations where the set of tasks is not known exactly in advance, for example, when different options how to continue a program may be granted. In this paper a new model for parallel and distributed programs, the dynamic process graph, will be introduced, which represents all possible executions of a program in a compact way. The size of this representation is small -- in many cases only logarithmically with respect to the size of any execution. An important feature of our model is that the encoded executions are directed acyclic graphs having a "regular" structure that is typical of parallel ..

CiteSeerX

Scheduling Dynamic Graphs

Author: B. Veltman
C. Papadimitriou
H. El-Rewini
H. Galperin
S. Ha
T. L. K. Wagner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref