154 research outputs found

    Platform independent profiling of a QCD code

    Get PDF
    The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires constant adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution for this common challenge is to use simulations of the application that can assist in detecting performance bottlenecks. Due to prohibitive costs of classical cycle-accurate simulators, coarse-grain simulations are more suitable for large parallel and distributed systems. We present a procedure of implementing the profiling for openQCD code [1] through simulation, which will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator [2], which allows for fast and accurate performance predictions of HPC codes. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated

    Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms

    Get PDF
    International audienceThe study of parallel and distributed applications and platforms, whether in the cluster, grid, peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of proposed algorithmic and system solutions via simulation. Unlike direct experimentation via an application deployment on a real-world testbed, simulation enables fully repeatable and configurable experiments for arbitrary hypothetical scenarios. Two key concerns are accuracy (so that simulation results are scientifically sound) and scalability (so that simulation experiments can be fast and memory-efficient). While the scalability of a simulator is easily measured, the accuracy of many state-of-the-art simulators is largely unknown because they have not been sufficiently validated. In this work we describe recent accuracy and scalability advances made in the context of the SimGrid simulation framework. A design goal of SimGrid is that it should be versatile, i.e., applicable across all aforementioned domains. We present quantitative results that show that SimGrid compares favorably to state-of-the-art domain-specific simulators in terms of scalability, accuracy, or the trade-off between the two. An important implication is that, contrary to popular wisdom, striving for versatility in a simulator is not an impediment but instead is conducive to improving both accuracy and scalability

    Assessing the Performance of MPI Applications Through Time-Independent Trace Replay

    Get PDF
    International audienceSimulation is a popular approach to obtain objective performance indicators platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In this work we present a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we present the used time-independent trace format, investigate several acquisition strategies, detail the developed trace replay tool, and assess the quality of our simulation framework in terms of accuracy, acquisition time, simulation time, and trace size.La simulation est une approche trĂšs populaire pour obtenir des indicateurs de performances objectifs sur des plates-formes qui ne sont pas disponibles. Cela peut permettre le dimensionnement de grappes de calculs au sein de grands centres de calcul. Dans cet article nous prĂ©sentons un outil de simulation post-mortem d'applications MPI. Sa principale originalitĂ© au regard de la littĂ©rature est d'utiliser des traces d'exĂ©cution indĂ©pendantes du temps. Cela permet de dĂ©coupler intĂ©gralement le processus d'acquisition des traces de celui de rejeu dans un contexte de simulation. Il est ainsi possible d'obtenir des traces pour de grandes instances de problĂšmes sans ĂȘtre limitĂ© Ă  des exĂ©cutions au sein d'une unique grappe. Enfin notre outil est dĂ©veloppĂ© au dessus d'un noyau de simulation scalable, rapide et validĂ©. Cet article prĂ©sente le format de traces indĂ©pendantes du temps utilisĂ©, Ă©tudie plusieurs stratĂ©gies d'acquisition, dĂ©taille l'outil de rejeu que nous avons dĂ©velopĂ©, et evaluĂ© la qualitĂ© de nos simulations en termes de prĂ©cision, temps d'acuisition, temps de simulation et tailles de traces

    Simulation of MPI applications with time-independent traces

    Get PDF
    International audienceAnalyzing and understanding the performance behavior of parallel applications on parallel computing platforms is a long-standing concern in the High Performance Computing community. When the targeted platforms are not available , simulation is a reasonable approach to obtain objective performance indicators and explore various hypothetical scenarios. In the context of applications implemented with the Message Passing Interface, two simulation methods have been proposed, on-line simulation and off-line simulation, both with their own drawbacks and advantages. In this work we present an off-line simulation framework, i.e., one that simulates the execution of an application based on event traces obtained from an actual execution. The main novelty of this work, when compared to previously proposed off-line simulators, is that traces that drive the simulation can be acquired on large, distributed, heterogeneous , and non-dedicated platforms. As a result the scalability of trace acquisition is increased, which is achieved by enforcing that traces contain no time-related information. Moreover, our framework is based on an state-of-the-art scalable, fast, and validated simulation kernel. We introduce the notion of performing off-line simulation from time-independent traces, propose and evaluate several trace acquisition strategies, describe our simulation framework, and assess its quality in terms of trace acquisition scalability, simulation accuracy, and simulation time

    Improving the Accuracy and Efficiency of Time-Independent Trace Replay

    Get PDF
    Simulation is a popular approach to obtain objective performance indicators on platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In a previous work, we proposed a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we detail the performance issues that we encountered with the first implementation of our trace replay framework. We propose several modifications to address these issues and analyze their impact. Results shows a clear improvement on the accuracy and efficiency with regard to the initial implementation.La simulation est une approche populaire pour obtenir des indicateurs de performance objectifs sur des plates-formes qui ne sont pas nĂ©cessairement accessibles. Elle peut par exemple aider au dimensionnement d'infrastructures dans de grands centres de calcul. Dans un article prĂ©cĂ©dent, nous avons proposĂ© un environnement pour la simulation hors-ligne d'applications MPI. La principale originalitĂ© de cet environnement par rapport Ă  la littĂ©rature est de ne reposer que sur des traces indĂ©pendantes du temps. Cela nous permet de dĂ©coupler totalement l'acquisition des traces de leur rejeu simulĂ© effectif. Nous sommes ainsi capables d'obtenir des traces pour de trĂšs grandes instances d'applications sans ĂȘtre limitĂ©s Ă  une exĂ©cution au sein d'une seule grappe de machines. Enfin, cet environnement est fondĂ© sur un noyau de simulation extensible, rapide et validĂ©. Dans cet article nous dĂ©taillons les problĂšmes de performance rencontrĂ©s par la premiĂšre implantation de notre environnement de rejeu de traces. Nous proposons plusieurs modifications pour rĂ©soudre ces problĂšmes et analysons leur impact. Les rĂ©sultats obtenus montrent une amĂ©lioration notable Ă  la fois en termes de prĂ©cision et d'efficacitĂ© par rapport Ă  l'implantation initiale

    From Simulation to Experiment: A Case Study on Multiprocessor Task Scheduling

    No full text
    International audienceSimulation is a popular approach for empirically evaluating the performance of algorithms and applications in the parallel computing domain. Most published works present results without quantifying simulation error. In this work we investigate accuracy issues when simulating the execution of parallel applications. This is a broad question, and we focus on a relevant case study: the evaluation of scheduling algorithms for executing mixed-parallel applications on clusters. Most such scheduling algorithms have been evaluated in simulation only. We compare simulations to real-world experiments in a view to identify which features of a simulator are most critical for simulation accuracy. Our first finding is that simple yet popular analytical simulation models lead to simulation results that cannot be used for soundly comparing scheduling algorithms. We then show that, by contrast, simulation models instantiated based on brute-force measurements of the target execution environment lead to usable results. Finally, we develop empirical simulation models that provide a reasonable compromise between the two previous approaches

    Validation of ESDS Using Epidemic-Based Data Dissemination Algorithms

    Get PDF
    The study of Distributed Systems (DS) is important as novel solutions in this area impact many sub-fields of Computer Science. Although, studying DS is not an easy task. A common approach is to deploy a test-bed to perform a precise evaluation of the system. This can be costly and time consuming for large scale platforms. Another solution is to perform network simulations, allowing for more flexibility and simplicity. Simulators implement various models such as wired/wireless network models and power consumption models. Extensible Simulator for Distributed Systems (ESDS) is a simulator designed for simulation of systems that include edge platforms, namely Internet of Things (IoT), Wireless Sensor Networks (WSN) and Cyber-Physical Systems (CPS). ESDS uses coarse-grained (flow-level) models for wired and wireless networks, and provides nodes power consumption models. However, to ensure accurate predictions, these models must be validated. In this paper, we propose to validate the flow-level wire-less model and the power consumption model of ESDS using epidemic-based data dissemination simulations. We show that ESDS has similar predictions than another validated flow-level network simulator, in terms of network performance and energy consumption
    • 

    corecore