8 research outputs found
ARcode: HPC Application Recognition Through Image-encoded Monitoring Data
Knowing HPC applications of jobs and analyzing their performance behavior
play important roles in system management and optimizations. The existing
approaches detect and identify HPC applications through machine learning
models. However, these approaches rely heavily on the manually extracted
features from resource utilization data to achieve high prediction accuracy. In
this study, we propose an innovative application recognition method, ARcode,
which encodes job monitoring data into images and leverages the automatic
feature learning capability of convolutional neural networks to detect and
identify applications. Our extensive evaluations based on the dataset collected
from a large-scale production HPC system show that ARcode outperforms the
state-of-the-art methodology by up to 18.87% in terms of accuracy at high
confidence thresholds. For some specific applications (BerkeleyGW and e3sm),
ARcode outperforms by over 20% at a confidence threshold of 0.8
A UPC++ Actor Library and Its Evaluation on a Shallow Water Proxy Application
Programmability is one of the key challenges of Exascale Computing. Using the actor model for distributed computations may be one solution. The actor model separates computation from communication while still enabling their over-lap. Each actor possesses specified communication endpoints to publish and receive information. Computations are undertaken based on the data available on these channels. We present a library that implements this programming model using UPC++, a PGAS library, and evaluate three different parallelization strategies, one based on rank-sequential execution, one based on multiple threads in a rank, and one based on OpenMP tasks. In an evaluation of our library using shallow water proxy applications, our solution compares favorably against an earlier implementation based on X10, and a BSP-based approach
A fast, low-memory, and stable algorithm for implementing multicomponent transport in direct numerical simulations
Implementing multicomponent diffusion models in reacting-flow simulations is
computationally expensive due to the challenges involved in calculating
diffusion coefficients. Instead, mixture-averaged diffusion treatments are
typically used to avoid these costs. However, to our knowledge, the accuracy
and appropriateness of the mixture-averaged diffusion models has not been
verified for three-dimensional turbulent premixed flames. In this study we
propose a fast,efficient, low-memory algorithm and use that to evaluate the
role of multicomponent mass diffusion in reacting-flow simulations. Direct
numerical simulation of these flames is performed by implementing the
Stefan-Maxwell equations in NGA. A semi-implicit algorithm decreases the
computational expense of inverting the full multicomponent ordinary diffusion
array while maintaining accuracy and fidelity. We first verify the method by
performing one-dimensional simulations of premixed hydrogen flames and compare
with matching cases in Cantera. We demonstrate the algorithm to be stable, and
its performance scales approximately with the number of species squared. Then,
as an initial study of multicomponent diffusion, we simulate premixed,
three-dimensional turbulent hydrogen flames, neglecting secondary Soret and
Dufour effects. Simulation conditions are carefully selected to match
previously published results and ensure valid comparison. Our results show that
using the mixture-averaged diffusion assumption leads to a 15% under-prediction
of the normalized turbulent flame speed for a premixed hydrogen-air flame. This
difference in the turbulent flame speed motivates further study into using the
mixture-averaged diffusion assumption for DNS of moderate-to-high Karlovitz
number flames.Comment: 36 pages, 14 figure
A fast, low-memory, and stable algorithm for implementing multicomponent transport in direct numerical simulations
Implementing multicomponent diffusion models in reacting-flow simulations is computationally expensive due to the challenges involved in calculating diffusion coefficients. Instead, mixture-averaged diffusion treatments are typically used to avoid these costs. However, to our knowledge, the accuracy and appropriateness of the mixture-averaged diffusion models has not been verified for three-dimensional turbulent premixed flames. In this study we propose a fast, efficient, low-memory algorithm and use that to evaluate the role of multicomponent mass diffusion in reacting-flow simulations. Direct numerical simulation of these flames is performed by implementing the Stefan–Maxwell equations in NGA. A semi-implicit algorithm decreases the computational expense of inverting the full multicomponent ordinary diffusion array while maintaining accuracy and fidelity. We first verify the method by performing one-dimensional simulations of premixed hydrogen flames and compare with matching cases in Cantera. We demonstrate the algorithm to be stable, and its performance scales approximately with the number of species squared. Then, as an initial study of multicomponent diffusion, we simulate premixed, three-dimensional turbulent hydrogen flames, neglecting secondary Soret and Dufour effects. Simulation conditions are carefully selected to match previously published results and ensure valid comparison. Our results show that using the mixture-averaged diffusion assumption leads to a 15% under-prediction of the normalized turbulent flame speed for a premixed hydrogen-air flame. This difference in the turbulent flame speed motivates further study into using the mixture-averaged diffusion assumption for DNS of moderate-to-high Karlovitz number flames
Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores
The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon-Phi “Knights Landing” (KNL) nodes. Compared to the Xeon-based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine-grain parallelization; vectorization; and use of the high-bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented
Recommended from our members
Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores
The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon-Phi “Knights Landing” (KNL) nodes. Compared to the Xeon-based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine-grain parallelization; vectorization; and use of the high-bandwidth MCDRAM memory. This paper describes our efforts preparing NERSC users for KNL through the NERSC Exascale Science Application Program, Web documentation, and user training. We discuss how we configured the Cori system for usability and productivity, addressing programming concerns, batch system configurations, and default KNL cluster and memory modes. System usage data, job completion analysis, programming and running jobs issues, and a few successful user stories on KNL are presented
Performance Observability and Monitoring of High Performance Computing with Microservices
Traditionally, High Performance Computing (HPC) softwarehas been built and deployed as bulk-synchronous, parallel
executables based on the message-passing interface (MPI) programming model.
The rise of data-oriented computing paradigms and an explosion
in the variety of applications that need to be supported on HPC
platforms have forced a re-think of the appropriate programming and execution models to integrate this new functionality.
In situ workflows demarcate a paradigm shift in
HPC software development methodologies enabling
a range of new applications ---
from user-level data services to machine learning (ML) workflows that run
alongside traditional scientific simulations.
By tracing the evolution of HPC software developmentover the past 30 years, this dissertation identifies the key elements and trends
responsible for the emergence of coupled, distributed, in situ workflows.
This dissertation's focus is on coupled in situ workflows
involving composable, high-performance microservices. After outlining the motivation
to enable performance observability of these services and why
existing HPC performance tools and techniques can not be applied in this context, this dissertation
proposes a solution wherein a set of techniques gathers, analyzes, and orients performance data from
different sources to generate observability. By leveraging microservice components initially designed
to build high performance data services,
this dissertation demonstrates their broader applicability for building and deploying performance
monitoring and visualization as services within an in situ workflow.
The results from this dissertation suggest that: (1) integration of
performance data from different sources is vital to understanding the performance
of service components, (2) the in situ (online) analysis of this performance data
is needed to enable the adaptivity of distributed components and manage monitoring data volume, (3) statistical modeling combined
with performance observations can help generate better service configurations, and (4) services are a promising
architecture choice for deploying in situ performance monitoring and visualization functionality.
This dissertation includes previously published and co-authored material and unpublished co-authored material