19 research outputs found

    Improving Simulations of MPI Applications Using A Hybrid Network Model with Topology and Contention Support

    Get PDF
    Proper modeling of collective communications is essential for understanding the behavior of medium-to-large scale parallel applications, and even minor deviations in implementation can adversely affect the prediction of real-world performance. We propose a hybrid network model extending LogP based approaches to account for topology and contention in high-speed TCP networks. This model is validated within SMPI, an MPI implementation provided by the SimGrid simulation toolkit. With SMPI, standard MPI applications can be compiled and run in a simulated network environment, and traces can be captured without incurring errors from tracing overheads or poor clock synchronization as in physical experiments. SMPI provides features for simulating applications that require large amounts of time or resources, including selective execution, ram folding, and off-line replay of execution traces. We validate our model by comparing traces produced by SMPI with those from other simulation platforms, as well as real world environments.Une bonne modélisation des communications collective est indispensable à la compréhension des performances des applications parallèles et des différences, même minimes, dans leur implémentation peut drastiquement modifier les performances escomptées. Nous proposons un modèle réseau hybrid étendant les approches de type LogP mais permettant de rendre compte de la topologie et de la contention pour les réseaux hautes performances utilisant TCP. Ce modèle est mis en oeuvre et validé au sein de SMPI, une implémentation de MPI fournie par l'environnement SimGrid. SMPI permet de compiler et d'exécuter sans modification des applications MPI dans un environnement simulé. Il est alors possible de capturer des traces sans l'intrusivité ni les problème de synchronisation d'horloges habituellement rencontrés dans des expériences réelles. SMPI permet également de simuler des applications gourmandes en mémoire ou en temps de calcul à l'aide de techniques telles l'exécution sélective, le repliement mémoire ou le rejeu hors-ligne de traces d'exécutions. Nous validons notre modèle en comparant les traces produites à l'aide de SMPI avec celles de traces d'exécution réelle. Nous montrons le gain obtenu en les comparant également à celles obtenues avec des modèles plus classiques utilisés dans des outils concurrents

    How to verify the precision of density-functional-theory implementations via reproducible and universal workflows

    Full text link
    In the past decades many density-functional theory methods and codes adopting periodic boundary conditions have been developed and are now extensively used in condensed matter physics and materials science research. Only in 2016, however, their precision (i.e., to which extent properties computed with different codes agree among each other) was systematically assessed on elemental crystals: a first crucial step to evaluate the reliability of such computations. We discuss here general recommendations for verification studies aiming at further testing precision and transferability of density-functional-theory computational approaches and codes. We illustrate such recommendations using a greatly expanded protocol covering the whole periodic table from Z=1 to 96 and characterizing 10 prototypical cubic compounds for each element: 4 unaries and 6 oxides, spanning a wide range of coordination numbers and oxidation states. The primary outcome is a reference dataset of 960 equations of state cross-checked between two all-electron codes, then used to verify and improve nine pseudopotential-based approaches. Such effort is facilitated by deploying AiiDA common workflows that perform automatic input parameter selection, provide identical input/output interfaces across codes, and ensure full reproducibility. Finally, we discuss the extent to which the current results for total energies can be reused for different goals (e.g., obtaining formation energies).Comment: Main text: 23 pages, 4 figures. Supplementary: 68 page

    Common workflows for computing material properties using different quantum engines

    Get PDF
    The prediction of material properties based on density-functional theory has become routinely common, thanks, in part, to the steady increase in the number and robustness of available simulation packages. This plurality of codes and methods is both a boon and a burden. While providing great opportunities for cross-verification, these packages adopt different methods, algorithms, and paradigms, making it challenging to choose, master, and efficiently use them. We demonstrate how developing common interfaces for workflows that automatically compute material properties greatly simplifies interoperability and cross-verification. We introduce design rules for reusable, code-agnostic, workflow interfaces to compute well-defined material properties, which we implement for eleven quantum engines and use to compute various material properties. Each implementation encodes carefully selected simulation parameters and workflow logic, making the implementer’s expertise of the quantum engine directly available to non-experts. All workflows are made available as open-source and full reproducibility of the workflows is guaranteed through the use of the AiiDA infrastructure.This work is supported by the MARVEL National Centre of Competence in Research (NCCR) funded by the Swiss National Science Foundation (grant agreement ID 51NF40-182892) and by the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 824143 (European MaX Centre of Excellence “Materials design at the Exascale”) and Grant Agreement No. 814487 (INTERSECT project). We thank M. Giantomassi and J.-M. Beuken for their contributions in adding support for PseudoDojo tables to the aiida-pseudo (https://github.com/aiidateam/aiida-pseudo) plugin. We also thank X. Gonze, M. Giantomassi, M. Probert, C. Pickard, P. Hasnip, J. Hutter, M. Iannuzzi, D. Wortmann, S. Blügel, J. Hess, F. Neese, and P. Delugas for providing useful feedback on the various quantum engine implementations. S.P. acknowledges support from the European Unions Horizon 2020 Research and Innovation Programme, under the Marie Skłodowska-Curie Grant Agreement SELPH2D No. 839217 and computer time provided by the PRACE-21 resources MareNostrum at BSC-CNS. E.F.-L. acknowledges the support of the Norwegian Research Council (project number 262339) and computational resources provided by Sigma2. P.Z.-P. thanks to the Faraday Institution CATMAT project (EP/S003053/1, FIRG016) for financial support. KE acknowledges the Swiss National Science Foundation (grant number 200020-182015). G.Pi. and K.E. acknowledge the swissuniversities “Materials Cloud” (project number 201-003). Work at ICMAB is supported by the Severo Ochoa Centers of Excellence Program (MICINN CEX2019-000917-S), by PGC2018-096955-B-C44 (MCIU/AEI/FEDER, UE), and by GenCat 2017SGR1506. B.Z. thanks to the Faraday Institution FutureCat project (EP/S003053/1, FIRG017) for financial support. J.B. and V.T. acknowledge support by the Joint Lab Virtual Materials Design (JLVMD) of the Forschungszentrum Jülich.Peer reviewe

    An open source tool chain for performance analysis

    No full text
    Modern supercomputers with multi-core nodes enhanced by accelerators, as well as hybrid programming models, introduce more complexity in modern applications. Efficiently Exploiting all of the available resources requires a complex performance analysis of applications in order to detect time-consuming or idle sections. This paper presents an open-source tool-chain for analyzing the performance of parallel applications. It is composed of a trace generation framework called eztrace, a generic interface for writing traces in multipe format called gtg, and a trace visualizer called vite. These tools cover the main steps of performance analysis - from the instrumentation of applications to the trace analysis - and are designed to maximize the compatibility with other performance analysis tools. Thus, these tools support multiple file formats and are not bound to a particular programming model. The evaluation of these tools show that they provide similar performance compared to other analysis tools, while being generic

    Simulating MPI applications: the SMPI approach

    Get PDF
    International audienceThis article summarizes our recent work and developments on SMPI, a flexible simulator of MPI applications. In this tool,we took a particular care to ensure our simulator could be used to produce fast and accurate predictions in a wide variety of situations.Although we did build SMPI on SimGrid whose speed and accuracy had already been assessed in other contexts, moving suchtechniques to a HPC workload required significant additional effort. Obviously, an accurate modeling of communications and networktopology was one of the key to such achievements. Another less obvious key was the choice to combine in a single tool the possibilityto do both offline and online simulation

    Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way

    Get PDF
    International audienceDetermining key characteristics of High Performance Computing machines that would allow to predict its performance is an old and recurrent dream. This was, for example, the rationale behind the design of the LogP model that later evolved into many variants (LogGP, LogGPS, LoGPS) to cope with network technology evolution and complexity. Although network has received a lot of attention, predicting the performance of computation kernels can be very challenging as well. In particular, the tremendous increase of internal parallelism and deep memory hierarchy in modern multi-core architectures often limit applications by the memory access rate. In this context, determining the key characteristics of a machine such as the peak bandwidth of each cache level as well as how an application uses such memory hierarchy can be the key to predict or to extrapolate the performance of applications. Based on such performance models, most high-level simulation-based frameworks separately characterize a machine and an application, later convolving both signatures to predict the overall performance. We evaluate the suitability of such approaches to modern architectures and applications by trying to reproduce the work of others. When trying to build our own framework, we realized that, regardless of the quality of the underlying models or software, most of these framework rely on " opaque " benchmarks to characterize the platform. In this article, we report the many pitfalls we encountered when trying to characterize both the network and the memory performance of modern machines. We claim that opaque benchmarks that do not clearly separate experiment design, measurements, and analysis should be avoided as much as possible. Likewise an a priori identification of experimental factors should be done to make sure the experimental conditions are adequate

    Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way

    No full text
    International audienceDetermining key characteristics of High Performance Computing machines that would allow to predict its performance is an old and recurrent dream. This was, for example, the rationale behind the design of the LogP model that later evolved into many variants (LogGP, LogGPS, LoGPS) to cope with network technology evolution and complexity. Although network has received a lot of attention, predicting the performance of computation kernels can be very challenging as well. In particular, the tremendous increase of internal parallelism and deep memory hierarchy in modern multi-core architectures often limit applications by the memory access rate. In this context, determining the key characteristics of a machine such as the peak bandwidth of each cache level as well as how an application uses such memory hierarchy can be the key to predict or to extrapolate the performance of applications. Based on such performance models, most high-level simulation-based frameworks separately characterize a machine and an application, later convolving both signatures to predict the overall performance. We evaluate the suitability of such approaches to modern architectures and applications by trying to reproduce the work of others. When trying to build our own framework, we realized that, regardless of the quality of the underlying models or software, most of these framework rely on " opaque " benchmarks to characterize the platform. In this article, we report the many pitfalls we encountered when trying to characterize both the network and the memory performance of modern machines. We claim that opaque benchmarks that do not clearly separate experiment design, measurements, and analysis should be avoided as much as possible. Likewise an a priori identification of experimental factors should be done to make sure the experimental conditions are adequate

    Simulating MPI Applications: The SMPI Approach

    No full text

    Performance Analysis of HPC Applications on Low-Power Embedded Platforms

    Get PDF
    This paper presents performance evaluation and analysis of well-known HPC applications and benchmarks running on low-power embedded platforms. The performance to power consumption ratios are compared to classical x86 systems. Scalability studies have been conducted on the Mont-Blanc Tibidabo cluster. We have also investigated optimization opportunities and pitfalls induced by the use of these new platforms, and proposed optimization strategies based on auto-tuning
    corecore