5 research outputs found
Adaptive Neural Network-Based Approximation to Accelerate Eulerian Fluid Simulation
The Eulerian fluid simulation is an important HPC application. The neural
network has been applied to accelerate it. The current methods that accelerate
the fluid simulation with neural networks lack flexibility and generalization.
In this paper, we tackle the above limitation and aim to enhance the
applicability of neural networks in the Eulerian fluid simulation. We introduce
Smartfluidnet, a framework that automates model generation and application.
Given an existing neural network as input, Smartfluidnet generates multiple
neural networks before the simulation to meet the execution time and simulation
quality requirement. During the simulation, Smartfluidnet dynamically switches
the neural networks to make the best efforts to reach the user requirement on
simulation quality. Evaluating with 20,480 input problems, we show that
Smartfluidnet achieves 1.46x and 590x speedup comparing with a state-of-the-art
neural network model and the original fluid simulation respectively on an
NVIDIA Titan X Pascal GPU, while providing better simulation quality than the
state-of-the-art model
CFDNet: a deep learning-based accelerator for fluid simulations
CFD is widely used in physical system design and optimization, where it is
used to predict engineering quantities of interest, such as the lift on a plane
wing or the drag on a motor vehicle. However, many systems of interest are
prohibitively expensive for design optimization, due to the expense of
evaluating CFD simulations. To render the computation tractable, reduced-order
or surrogate models are used to accelerate simulations while respecting the
convergence constraints provided by the higher-fidelity solution. This paper
introduces CFDNet -- a physical simulation and deep learning coupled framework,
for accelerating the convergence of Reynolds Averaged Navier-Stokes
simulations. CFDNet is designed to predict the primary physical properties of
the fluid including velocity, pressure, and eddy viscosity using a single
convolutional neural network at its core. We evaluate CFDNet on a variety of
use-cases, both extrapolative and interpolative, where test geometries are
observed/not-observed during training. Our results show that CFDNet meets the
convergence constraints of the domain-specific physics solver while
outperforming it by 1.9 - 7.4x on both steady laminar and turbulent flows.
Moreover, we demonstrate the generalization capacity of CFDNet by testing its
prediction on new geometries unseen during training. In this case, the approach
meets the CFD convergence criterion while still providing significant speedups
over traditional domain-only models.Comment: It has been accepted and almost published in the International
Conference in Supercomputing (ICS) 202
HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC
This paper presents HALO 1.0, an open-ended extensible multi-agent software
framework that implements a set of proposed hardware-agnostic accelerator
orchestration (HALO) principles. HALO implements a novel compute-centric
message passing interface (C^2MPI) specification for enabling the
performance-portable execution of a hardware-agnostic host application across
heterogeneous accelerators. The experiment results of evaluating eight widely
used HPC subroutines based on Intel Xeon E5-2620 CPUs, Intel Arria 10 GX FPGAs,
and NVIDIA GeForce RTX 2080 Ti GPUs show that HALO 1.0 allows for a unified
control flow for host programs to run across all the computing devices with a
consistently top performance portability score, which is up to five orders of
magnitude higher than the OpenCL-based solution.Comment: 21 page
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
CNN-based surrogates have become prevalent in scientific applications to
replace conventional time-consuming physical approaches. Although these
surrogates can yield satisfactory results with significantly lower computation
costs over small training datasets, our benchmarking results show that
data-loading overhead becomes the major performance bottleneck when training
surrogates with large datasets. In practice, surrogates are usually trained
with high-resolution scientific data, which can easily reach the terabyte
scale. Several state-of-the-art data loaders are proposed to improve the
loading throughput in general CNN training; however, they are sub-optimal when
applied to the surrogate training. In this work, we propose SOLAR, a surrogate
data loader, that can ultimately increase loading throughput during the
training. It leverages our three key observations during the benchmarking and
contains three novel designs. Specifically, SOLAR first generates a
pre-determined shuffled index list and accordingly optimizes the global access
order and the buffer eviction scheme to maximize the data reuse and the buffer
hit rate. It then proposes a tradeoff between lightweight computational
imbalance and heavyweight loading workload imbalance to speed up the overall
training. It finally optimizes its data access pattern with HDF5 to achieve a
better parallel I/O throughput. Our evaluation with three scientific surrogates
and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch
Data Loader and 3.52X speedup over state-of-the-art data loaders.Comment: 14 pages, 15 figures, 5 tables, submitted to VLDB '2