Search CORE

80 research outputs found

Exascale machines require new programming paradigms and runtimes

Author: Astsatryan Hrachya
Da Costa Georges
Fahringer Thomas
Grasso Ivan
Hristov Atanas
Karatza Helen D.
Lastovetsky Alexey
Marozzo Fabrizio
Petcu Dana
Rico-Gallego Juan-Antonio
Stavrinides Georgios L.
Talia Domenico
Trufio Paolo
Publication venue
Publication date: 01/01/2015
Field of study

Extreme scale parallel computing systems will have tens of thousands of optionally accelerator-equiped nodes with hundreds of cores each, as well as deep memory hierarchies and complex interconnect topologies. Such Exascale systems will provide hardware parallelism at multiple levels and will be energy constrained. Their extreme scale and the rapidly deteriorating reliablity of their hardware components means that Exascale systems will exhibit low mean-time-between-failure values. Furthermore, existing programming models already require heroic programming and optimisation efforts to achieve high efficiency on current supercomputers. Invariably, these efforts are platform-specific and non-portable. In this paper we will explore the shortcomings of existing programming models and runtime systems for large scale computing systems. We then propose and discuss important features of programming paradigms and runtime system to deal with large scale computing systems with a special focus on data-intensive applications and resilience. Finally, we also discuss code sustainability issues and propose several software metrics that are of paramount importance for code development for large scale computing systems

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Open Access Repository

Dynamic X10. Resource-Aware Programming for Higher Efficiency

Author: Braun Matthias
Buchwald Sebastian
Mohr Manuel
Zwinkau Andreas
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2014
Field of study

KITopen

The Landscape of Exascale Research: A Data-Driven Literature Analysis

Author: Belloum A.S.Z.
Heldens S.
Hijma P.
Maassen J.
Van Nieuwpoort R.V.
Van Werkhoven B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2020
Field of study

International Migration, Integration and Social Cohesion online publications

A Survey on Hardware-aware and Heterogeneous Computing on Multicore Processors and Accelerators

Author: Buchty Rainer
Heuveline Vincent
Karl Wolfgang
Weiß Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2009
Field of study

KITopen

Dynamic Power Management for Reactive Stream Processing on the SCC Tiled Architecture

Author: B Rountree
C Grelck
C Poellabauer
D Gusfield
DW Marquardt
E Bini
G Chen
I Buck
J-J Chen
Jens Knoop
JH Anderson
L Wang
M Bambagini
Michael Zolda
MY Lim
N Ioannou
N Kappiah
Nilesh Karavadara
P Gschwandtner
Q Cai
Raimund Kirner
V Nguyen
VTN Nguyen
Vu Thien Nga Nguyen
W Thies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.Dynamic voltage and frequency scaling} (DVFS) is a means to adjust the computing capacity and power consumption of computing systems to the application demands. DVFS is generally useful to provide a compromise between computing demands and power consumption, especially in the areas of resource-constrained computing systems. Many modern processors support some form of DVFS. In this article we focus on the development of an execution framework that provides light-weight DVFS support for reactive stream-processing systems (RSPS). RSPS are a common form of embedded control systems, operating in direct response to inputs from their environment. At the execution framework we focus on support for many-core scheduling for parallel execution of concurrent programs. We provide a DVFS strategy for RSPS that is simple and lightweight, to be used for dynamic adaptation of the power consumption at runtime. The simplicity of the DVFS strategy became possible by sole focus on the application domain of RSPS. The presented DVFS strategy does not require specific assumptions about the message arrival rate or the underlying scheduling method. While DVFS is a very active field, in contrast to most existing research, our approach works also for platforms like many-core processors, where the power settings typically cannot be controlled individually for each computational unit. We also support dynamic scheduling with variable workload. While many research results are provided with simulators, in our approach we present a parallel execution framework with experiments conducted on real hardware, using the SCC many-core processor. The results of our experimental evaluation confirm that our simple DVFS strategy provides potential for significant energy saving on RSPS.Peer reviewe

Crossref

Springer - Publisher Connector

University of Hertfordshire Research Archive

Trends in Data Locality Abstractions for HPC Systems

Author: Amir Kamil
Anshu Dubey
Bradford L. Chamberlain
Chris J. Newburn
Didem Unat
Emmanuel Jeannot
Frank Hannig
H. Carter Edwards
Hal Finkel
Hatem Ltaief
Jeff Keasler
John Shalf
Karl Fuerlinger
Mark Abraham
Mauro Bianco
Miquel Pericas
Naoya Maruyama
Paul H J Kelly
Romain Cledat
Torsten Hoefler
Vitus Leung
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A system’s approach to cache hierarchy-aware decomposition of data-parallel computations

Author: Delgado Nuno Miguel de Brito
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2014
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe architecture of nowadays’ processors is very complex, comprising several computational cores and an intricate hierarchy of cache memories. The latter, in particular, differ considerably between the many processors currently available in the market, resulting in a wide variety of configurations. Application development is typically oblivious of this complexity and diversity, taking only into consideration the number of available execution cores. This oblivion prevents such applications from fully harnessing the computing power available in these architectures. This problem has been recognized by the community, which has proposed languages and models to express and tune applications according to the underlying machine’s hierarchy. These, however, lack the desired abstraction level, forcing the programmer to have deep knowledge of computer architecture and parallel programming, in order to ensure performance portability across a wide range of architectures. Realizing these limitations, the goal of this thesis is to delegate these hierarchy-aware optimizations to the runtime system. Accordingly, the programmer’s responsibilities are confined to the definition of procedures for decomposing an application’s domain, into an arbitrary number of partitions. With this, the programmer has only to reason about the application’s data representation and manipulation. We prototyped our proposal on top of a Java parallel programming framework, and evaluated it from a performance perspective, against cache neglectful domain decompositions. The results demonstrate that our optimizations deliver significant speedups against decomposition strategies based solely on the number of execution cores, without requiring the programmer to reason about the machine’s hardware. These facts allow us to conclude that it is possible to obtain performance gains by transferring hierarchyaware optimizations concerns to the runtime system

Repositório da Universidade Nova de Lisboa