Search CORE

Hal - Université Grenoble Alpes

Platform independent profiling of a QCD code

Author: Marinkovic Marina Krstic
Stanisic Luka
Publication venue
Publication date: 24/07/2016
Field of study

The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution for this common challenge is to use simulations of the application that can assist in detecting performance bottlenecks. Due to prohibitive costs of classical cycle-accurate simulators, coarse-grain simulations are more suitable for large parallel and distributed systems. We present a procedure of implementing the profiling for openQCD code [1] through simulation, which will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator [2], which allows for fast and accurate performance predictions of HPC codes. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated

CERN Document Server

A Workflow for Fast Evaluation of Mapping Heuristics Targeting Cloud Infrastructures

Author: Cheptsov Alexey
Gamatie Abdoulaye
Khabi Dmitry
Latif Khalid
Novo David
Sassatelli Gilles
Selva Manuel
Ursu Roman
Publication venue
Publication date: 19/01/2016
Field of study

Resource allocation is today an integral part of cloud infrastructures management to efficiently exploit resources. Cloud infrastructures centers generally use custom built heuristics to define the resource allocations. It is an immediate requirement for the management tools of these centers to have a fast yet reasonably accurate simulation and evaluation platform to define the resource allocation for cloud applications. This work proposes a framework allowing users to easily specify mappings for cloud applications described in the AMALTHEA format used in the context of the DreamCloud European project and to assess the quality for these mappings. The two quality metrics provided by the framework are execution time and energy consumption.Comment: 2nd International Workshop on Dynamic Resource Allocation and Management in Embedded, High Performance and Cloud Computing DREAMCloud 2016 (arXiv:cs/1601.04675

HAL Descartes

Elastic Management of Byzantine Faults

Author: Arantes Luciana
Bournat Marjorie
Friedman Roy
Marin Olivier
Sens Pierre
Publication venue: HAL CCSD
Publication date: 07/09/2015
Field of study

International audienceTolerating byzantine faults on a large scale is a challenge: in particular, Desktop Grid environments sustain large numbers of faults that range from crashes to byzantine faults. Solutions in the literature that address byzantine failures are costly and none of them scales to really large numbers of nodes. This paper proposes to distribute task scheduling on trusted nodes in a Cloud network and to have these nodes assess the reliability of worker nodes by means of a reputation system. The resulting architecture is built for scalability and adapts costs to the workload associated with client requests

HAL Descartes

Performance Reproduction and Prediction of Selected Dynamic Loop Scheduling Experiments

Author: Ciorba Florina M.
Eleliemy Ahmed
Mohammed Ali
Publication venue
Publication date: 01/01/2018
Field of study

Scientific applications are complex, large, and often exhibit irregular and stochastic behavior. The use of efficient loop scheduling techniques in computationally-intensive applications is crucial for improving their performance on high-performance computing (HPC) platforms. A number of dynamic loop scheduling (DLS) techniques have been proposed between the late 1980s and early 2000s, and efficiently used in scientific applications. In most cases, the computing systems on which they have been tested and validated are no longer available. This work is concerned with the minimization of the sources of uncertainty in the implementation of DLS techniques to avoid unnecessary influences on the performance of scientific applications. Therefore, it is important to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications. The goal of this work is to attain and increase the trust in the implementation of DLS techniques in present studies. To achieve this goal, the performance of a selection of scheduling experiments from the 1992 original work that introduced factoring is reproduced and predicted via both, simulative and native experimentation. The experiments show that the simulation reproduces the performance achieved on the past computing platform and accurately predicts the performance achieved on the present computing platform. The performance reproduction and prediction confirm that the present implementation of the DLS techniques considered both, in simulation and natively, adheres to their original description. The results confirm the hypothesis that reproducing experiments of identical scheduling scenarios on past and modern hardware leads to an entirely different behavior from expected

edoc

Platform independent profiling of a QCD code

Author: Krstic Marinkovic Marina
Stanisic Luka
Publication venue: HAL CCSD
Publication date: 24/07/2016
Field of study

International audienceThe supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution for this common challenge is to use simulations of the application that can assist in detecting performance bottlenecks. Due to prohibitive costs of classical cycle-accurate simulators, coarse-grain simulations are more suitable for large parallel and distributed systems. We present a procedure of implementing the profiling for openQCD code [1] through simulation, which will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator [2], which allows for fast and accurate performance predictions of HPC codes. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated

SiL: An Approach for Adjusting Applications to Heterogeneous Systems Under Perturbations

Author: C.P. Kruskal
CD Polychronopoulos
H Casanova
I Banicescu
JB Rawlings
LC Canon
R Mehrotra
RL Cariño
S Ali
S Browne
S Flynn Hummel
Publication venue
Publication date: 01/01/2018
Field of study

Scientific applications consist of large and computationally-intensive loops. Dynamic loop scheduling (DLS) techniques are used to load balance the execution of such applications. Load imbalance can be caused by variations in loop iteration execution times due to problem, algorithmic, or systemic characteristics (also, perturbations). The following question motivates this work: "Given an application, a high-performance computing (HPC) system, and both their characteristics and interplay, which DLS technique will achieve improved performance under unpredictable perturbations?" Existing work only considers perturbations caused by variations in the HPC system delivered computational speeds. However, perturbations in available network bandwidth or latency are inevitable on production HPC systems. Simulator in the loop (SiL) is introduced, herein, as a new control-theoretic inspired approach to dynamically select DLS techniques that improve the performance of applications on heterogeneous HPC systems under perturbations. The present work examines the performance of six applications on a heterogeneous system under all above system perturbations. The SiL proof of concept is evaluated using simulation. The performance results confirm the initial hypothesis that no single DLS technique can deliver best performance in all scenarios, while the SiL-based DLS selection delivered improved application performance in most experiments

edoc

Toward More Scalable Off-Line Simulations of MPI Applications

Author: Casanova Henri
Gupta Anshul
Suter Frédéric
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/09/2015
Field of study

International audienceThe off-line (or post-mortem) analysis of execution event traces is a popular approach to understand the performance of HPC applications that use the message passing paradigm. Combining this analysis with simulation makes it possible to " replay " the application execution to explore " what if? " scenarios, e.g., assessing application performance in a range of (hypothetical) execution environments. However, such off-line analysis faces scalability issues for acquiring, storing, or replaying large event traces. We first present two previously proposed and complementary frameworks for off-line replaying of MPI application event traces, each with its own objectives and limitations. We then describe how these frameworks can be combined so as to capitalize on their respective strengths while alleviating several of their limitations. We claim that the combined framework affords levels of scalability that are beyond that achievable by either one of the two individual frameworks. We evaluate this framework to illustrate the benefits of the proposed combination for a more scalable off-line analysis of MPI applications

HAL-ENS-LYON

HAL-IN2P3