Search CORE

8 research outputs found

ARcode: HPC Application Recognition Through Image-encoded Monitoring Data

Author: Chen Yong
Cook Brandon
Li Jie
Publication venue
Publication date: 20/01/2023
Field of study

Knowing HPC applications of jobs and analyzing their performance behavior play important roles in system management and optimizations. The existing approaches detect and identify HPC applications through machine learning models. However, these approaches rely heavily on the manually extracted features from resource utilization data to achieve high prediction accuracy. In this study, we propose an innovative application recognition method, ARcode, which encodes job monitoring data into images and leverages the automatic feature learning capability of convolutional neural networks to detect and identify applications. Our extensive evaluations based on the dataset collected from a large-scale production HPC system show that ARcode outperforms the state-of-the-art methodology by up to 18.87% in terms of accuracy at high confidence thresholds. For some specific applications (BerkeleyGW and e3sm), ARcode outperforms by over 20% at a confidence threshold of 0.8

arXiv.org e-Print Archive

A UPC++ Actor Library and Its Evaluation on a Shallow Water Proxy Application

Author: Baden S
Bader M
Pppl A
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Programmability is one of the key challenges of Exascale Computing. Using the actor model for distributed computations may be one solution. The actor model separates computation from communication while still enabling their over-lap. Each actor possesses specified communication endpoints to publish and receive information. Computations are undertaken based on the data available on these channels. We present a library that implements this programming model using UPC++, a PGAS library, and evaluate three different parallelization strategies, one based on rank-sequential execution, one based on multiple threads in a rank, and one based on OpenMP tasks. In an evaluation of our library using shallow water proxy applications, our solution compares favorably against an earlier implementation based on X10, and a BSP-based approach

Crossref

eScholarship - University of California

A fast, low-memory, and stable algorithm for implementing multicomponent transport in direct numerical simulations

Author: Beardsell Guillaume
Blanquart Guillaume
Fillo Aaron J.
Niemeyer Kyle E.
Schlup Jason
Publication venue: 'Elsevier BV'
Publication date: 06/11/2019
Field of study

Implementing multicomponent diffusion models in reacting-flow simulations is computationally expensive due to the challenges involved in calculating diffusion coefficients. Instead, mixture-averaged diffusion treatments are typically used to avoid these costs. However, to our knowledge, the accuracy and appropriateness of the mixture-averaged diffusion models has not been verified for three-dimensional turbulent premixed flames. In this study we propose a fast,efficient, low-memory algorithm and use that to evaluate the role of multicomponent mass diffusion in reacting-flow simulations. Direct numerical simulation of these flames is performed by implementing the Stefan-Maxwell equations in NGA. A semi-implicit algorithm decreases the computational expense of inverting the full multicomponent ordinary diffusion array while maintaining accuracy and fidelity. We first verify the method by performing one-dimensional simulations of premixed hydrogen flames and compare with matching cases in Cantera. We demonstrate the algorithm to be stable, and its performance scales approximately with the number of species squared. Then, as an initial study of multicomponent diffusion, we simulate premixed, three-dimensional turbulent hydrogen flames, neglecting secondary Soret and Dufour effects. Simulation conditions are carefully selected to match previously published results and ensure valid comparison. Our results show that using the mixture-averaged diffusion assumption leads to a 15% under-prediction of the normalized turbulent flame speed for a premixed hydrogen-air flame. This difference in the turbulent flame speed motivates further study into using the mixture-averaged diffusion assumption for DNS of moderate-to-high Karlovitz number flames.Comment: 36 pages, 14 figure

arXiv.org e-Print Archive

Caltech Authors

A fast, low-memory, and stable algorithm for implementing multicomponent transport in direct numerical simulations

Author: Beardsell Guillaume
Blanquart Guillaume
Fillo Aaron J.
Niemeyer Kyle E.
Schlup Jason
Publication venue: 'Elsevier BV'
Publication date: 01/04/2020
Field of study

Implementing multicomponent diffusion models in reacting-flow simulations is computationally expensive due to the challenges involved in calculating diffusion coefficients. Instead, mixture-averaged diffusion treatments are typically used to avoid these costs. However, to our knowledge, the accuracy and appropriateness of the mixture-averaged diffusion models has not been verified for three-dimensional turbulent premixed flames. In this study we propose a fast, efficient, low-memory algorithm and use that to evaluate the role of multicomponent mass diffusion in reacting-flow simulations. Direct numerical simulation of these flames is performed by implementing the Stefan–Maxwell equations in NGA. A semi-implicit algorithm decreases the computational expense of inverting the full multicomponent ordinary diffusion array while maintaining accuracy and fidelity. We first verify the method by performing one-dimensional simulations of premixed hydrogen flames and compare with matching cases in Cantera. We demonstrate the algorithm to be stable, and its performance scales approximately with the number of species squared. Then, as an initial study of multicomponent diffusion, we simulate premixed, three-dimensional turbulent hydrogen flames, neglecting secondary Soret and Dufour effects. Simulation conditions are carefully selected to match previously published results and ensure valid comparison. Our results show that using the mixture-averaged diffusion assumption leads to a 15% under-prediction of the normalized turbulent flame speed for a premixed hydrogen-air flame. This difference in the turbulent flame speed motivates further study into using the mixture-averaged diffusion assumption for DNS of moderate-to-high Karlovitz number flames

Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores

Author: Baron E
Cook B
Deslippe J
Friesen B
Gerber R
Hartman-Baker R
Hauschildt P
He Y
Koniges A
Kurth T
Leak S
Yang WS
Zhao Z
Publication venue: 'Wiley'
Publication date: 10/01/2018
Field of study

The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon-Phi “Knights Landing” (KNL) nodes. Compared to the Xeon-based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine-grain parallelization; vectorization; and use of the high-bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented

Crossref

eScholarship - University of California

Recommended from our members

Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores

Author: Baron E
Cook B
Deslippe J
Friesen B
Gerber R
Hartman-Baker R
Hauschildt P
He Y
Koniges A
Kurth T
Leak S
Yang WS
Zhao Z
Publication venue: eScholarship, University of California
Publication date: 10/01/2018
Field of study

eScholarship - University of California

Performance Observability and Monitoring of High Performance Computing with Microservices

Author: Ramesh Srinivasan
Publication venue: University of Oregon
Publication date: 04/10/2022
Field of study

Traditionally, High Performance Computing (HPC) softwarehas been built and deployed as bulk-synchronous, parallel executables based on the message-passing interface (MPI) programming model. The rise of data-oriented computing paradigms and an explosion in the variety of applications that need to be supported on HPC platforms have forced a re-think of the appropriate programming and execution models to integrate this new functionality. In situ workflows demarcate a paradigm shift in HPC software development methodologies enabling a range of new applications --- from user-level data services to machine learning (ML) workflows that run alongside traditional scientific simulations. By tracing the evolution of HPC software developmentover the past 30 years, this dissertation identifies the key elements and trends responsible for the emergence of coupled, distributed, in situ workflows. This dissertation's focus is on coupled in situ workflows involving composable, high-performance microservices. After outlining the motivation to enable performance observability of these services and why existing HPC performance tools and techniques can not be applied in this context, this dissertation proposes a solution wherein a set of techniques gathers, analyzes, and orients performance data from different sources to generate observability. By leveraging microservice components initially designed to build high performance data services, this dissertation demonstrates their broader applicability for building and deploying performance monitoring and visualization as services within an in situ workflow. The results from this dissertation suggest that: (1) integration of performance data from different sources is vital to understanding the performance of service components, (2) the in situ (online) analysis of this performance data is needed to enable the adaptivity of distributed components and manage monitoring data volume, (3) statistical modeling combined with performance observations can help generate better service configurations, and (4) services are a promising architecture choice for deploying in situ performance monitoring and visualization functionality. This dissertation includes previously published and co-authored material and unpublished co-authored material

University of Oregon Scholars' Bank