Search CORE

5,608 research outputs found

Simulation of MPI applications with time-independent traces

Author: Adve
Bagrodia
Browne
Chassin de Kergommeaux
Dickens
Gabriel
Geimer
Genovese
Gropp
Knüpfer
Núñez
Prakash
Reussner
Shende
Zheng
Publication venue: 'Wiley'
Publication date: 01/04/2015
Field of study

International audienceAnalyzing and understanding the performance behavior of parallel applications on parallel computing platforms is a long-standing concern in the High Performance Computing community. When the targeted platforms are not available , simulation is a reasonable approach to obtain objective performance indicators and explore various hypothetical scenarios. In the context of applications implemented with the Message Passing Interface, two simulation methods have been proposed, on-line simulation and off-line simulation, both with their own drawbacks and advantages. In this work we present an off-line simulation framework, i.e., one that simulates the execution of an application based on event traces obtained from an actual execution. The main novelty of this work, when compared to previously proposed off-line simulators, is that traces that drive the simulation can be acquired on large, distributed, heterogeneous , and non-dedicated platforms. As a result the scalability of trace acquisition is increased, which is achieved by enforcing that traces contain no time-related information. Moreover, our framework is based on an state-of-the-art scalable, fast, and validated simulation kernel. We introduce the notion of performing off-line simulation from time-independent traces, propose and evaluate several trace acquisition strategies, describe our simulation framework, and assess its quality in terms of trace acquisition scalability, simulation accuracy, and simulation time

HAL-ENS-LYON

HAL-IN2P3

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Evaluation of Profiling Tools for the Acquisition of Time Independent Traces

Author: Desprez Frédéric
Markomanolis George
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 08/07/2013
Field of study

In a previous work, we proposed a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. Time-independent traces are an original way to estimate the performance of parallel applications. To acquire time-independent traces of the execution of MPI applications, we have to instrument them to log the necessary information. There exist many profiling tools which can instrument an application. In this report we propose a scoring system that corresponds to our framework specific requirements and evaluate the most well-known and open source profiling tools according to it. Furthermore we introduce an original tool called Minimal Instrumentation that was designed to fulfill the requirements of our framework.Dans nos précédents travaux, nous avons proposé un environnement pour la simulation hors-ligne d'applications MPI. Sa principale originalité vis-à-vis de la littérature est de s'appuyer sur des traces d'exécution indépendantes du temps. Cela constitue une manière originale d'estimer les performances d'applications parallèles. Pour acquérir de telles traces indépendantes du temps lors de l'exécution d'applications MPI, nous devns les instrumenter afin de recueillir toutes les informations nécessaires. Il existe de nombreux outils de profiling permettant d'instrumenter une application. Dans ce rapport, nous proposons une méthode de notation correspondant aux besoins spécifiques de notre environnement et évaluons les outils de profiling open-source les plus connus selon cette méthode. De plus, nous introduisons un outil original, appelé Minimal Instrumentation, spécialement conçu pour répondre aux besoins de notre environnement

HAL-ENS-LYON

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

Assessing the Performance of MPI Applications Through Time-Independent Trace Replay

Author: Desprez Frédéric
Markomanolis George
Quinson Martin
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceSimulation is a popular approach to obtain objective performance indicators platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In this work we present a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we present the used time-independent trace format, investigate several acquisition strategies, detail the developed trace replay tool, and assess the quality of our simulation framework in terms of accuracy, acquisition time, simulation time, and trace size.La simulation est une approche très populaire pour obtenir des indicateurs de performances objectifs sur des plates-formes qui ne sont pas disponibles. Cela peut permettre le dimensionnement de grappes de calculs au sein de grands centres de calcul. Dans cet article nous présentons un outil de simulation post-mortem d'applications MPI. Sa principale originalité au regard de la littérature est d'utiliser des traces d'exécution indépendantes du temps. Cela permet de découpler intégralement le processus d'acquisition des traces de celui de rejeu dans un contexte de simulation. Il est ainsi possible d'obtenir des traces pour de grandes instances de problèmes sans être limité à des exécutions au sein d'une unique grappe. Enfin notre outil est développé au dessus d'un noyau de simulation scalable, rapide et validé. Cet article présente le format de traces indépendantes du temps utilisé, étudie plusieurs stratégies d'acquisition, détaille l'outil de rejeu que nous avons dévelopé, et evalué la qualité de nos simulations en termes de précision, temps d'acuisition, temps de simulation et tailles de traces

HAL-ENS-LYON

CiteSeerX

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

Improving the Accuracy and Efficiency of Time-Independent Trace Replay

Author: Desprez Frédéric
Markomanolis George
Suter Frédéric
Publication venue: HAL CCSD
Publication date: 01/10/2012
Field of study

Simulation is a popular approach to obtain objective performance indicators on platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In a previous work, we proposed a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we detail the performance issues that we encountered with the first implementation of our trace replay framework. We propose several modifications to address these issues and analyze their impact. Results shows a clear improvement on the accuracy and efficiency with regard to the initial implementation.La simulation est une approche populaire pour obtenir des indicateurs de performance objectifs sur des plates-formes qui ne sont pas nécessairement accessibles. Elle peut par exemple aider au dimensionnement d'infrastructures dans de grands centres de calcul. Dans un article précédent, nous avons proposé un environnement pour la simulation hors-ligne d'applications MPI. La principale originalité de cet environnement par rapport à la littérature est de ne reposer que sur des traces indépendantes du temps. Cela nous permet de découpler totalement l'acquisition des traces de leur rejeu simulé effectif. Nous sommes ainsi capables d'obtenir des traces pour de très grandes instances d'applications sans être limités à une exécution au sein d'une seule grappe de machines. Enfin, cet environnement est fondé sur un noyau de simulation extensible, rapide et validé. Dans cet article nous détaillons les problèmes de performance rencontrés par la première implantation de notre environnement de rejeu de traces. Nous proposons plusieurs modifications pour résoudre ces problèmes et analysons leur impact. Les résultats obtenus montrent une amélioration notable à la fois en termes de précision et d'efficacité par rapport à l'implantation initiale

HAL-ENS-LYON

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

Platform independent profiling of a QCD code

Author: Marinkovic Marina Krstic
Stanisic Luka
Publication venue
Publication date: 24/07/2016
Field of study

The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution for this common challenge is to use simulations of the application that can assist in detecting performance bottlenecks. Due to prohibitive costs of classical cycle-accurate simulators, coarse-grain simulations are more suitable for large parallel and distributed systems. We present a procedure of implementing the profiling for openQCD code [1] through simulation, which will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator [2], which allows for fast and accurate performance predictions of HPC codes. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters

Author: Calore Enrico
Mantovani Filippo
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università di Ferrara