47 research outputs found
Deep learning based surrogate modeling for thermal plume prediction of groundwater heat pumps
The ability for groundwater heat pumps to meet space heating and cooling
demands without relying on fossil fuels, has prompted their mass roll out in
dense urban environments. In regions with high subsurface groundwater flow
rates, the thermal plume generated from a heat pump's injection well can
propagate downstream, affecting surrounding users and reducing their heat pump
efficiency. To reduce the probability of interference, regulators often rely on
simple analytical models or high fidelity groundwater simulations to determine
the impact that a heat pump has on the subsurface aquifer and surrounding heat
pumps. These are either too inaccurate or too computationally expensive for
everyday use. In this work, a surrogate model was developed to provide a quick,
high accuracy prediction tool of the thermal plume generated by a heat pump
within heterogeneous subsurface aquifers. Three variations of a convolutional
neural network were developed that accepts the known groundwater Darcy
velocities as discrete two-dimensional inputs and predicts the temperature
within the subsurface aquifer around the heat pump. A data set consisting of
800 numerical simulation samples, generated from random permeability fields and
pressure boundary conditions, was used to provide pseudo-randomized Darcy
velocity fields as input fields and the temperature field solution for training
the network. The subsurface temperature field output from the network provides
a more realistic temperature field that follows the Darcy velocity streamlines,
while being orders of magnitude faster than conventional high fidelity solversComment: 24 pages, 11 figure
A mass-conserving sparse grid combination technique with biorthogonal hierarchical basis functions for kinetic simulations
The exact numerical simulation of plasma turbulence is one of the assets and
challenges in fusion research. For grid-based solvers, sufficiently fine
resolutions are often unattainable due to the curse of dimensionality. The
sparse grid combination technique provides the means to alleviate the curse of
dimensionality for kinetic simulations. However, the hierarchical
representation for the combination step with the state-of-the-art hat functions
suffers from poor conservation properties and numerical instability.
The present work introduces two new variants of hierarchical multiscale basis
functions for use with the combination technique: the biorthogonal and full
weighting bases. The new basis functions conserve the total mass and are shown
to significantly increase accuracy for a finite-volume solution of constant
advection. Further numerical experiments based on the combination technique
applied to a semi-Lagrangian Vlasov--Poisson solver show a stabilizing effect
of the new bases on the simulations
Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL
Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity
of available accelerator cards within current supercomputers, portability is a
key aspect for modern HPC applications. In Octo-Tiger, we rely on Kokkos and
its various execution spaces for portable compute kernels. In turn, we use HPX
to coordinate kernel launches, CPU tasks, and communication. This combination
allows us to have a fine interleaving between portable CPU/GPU computations and
communication, enabling scalability on various supercomputers. However, for HPX
and Kokkos to work together optimally, we need to be able to treat Kokkos
kernels as HPX tasks. Otherwise, instead of integrating asynchronous Kokkos
kernel launches into HPX's task graph, we would have to actively wait for them
with fence commands, which wastes CPU time better spent otherwise. Using an
integration layer called HPX-Kokkos, treating Kokkos kernels as tasks already
works for some Kokkos execution spaces (like the CUDA one), but not for others
(like the SYCL one). In this work, we started making Octo-Tiger and HPX itself
compatible with SYCL. To do so, we introduce numerous software changes, most
notably an HPX-SYCL integration. This integration allows us to treat SYCL
events as HPX tasks, which in turn allows us to better integrate Kokkos by
extending the support of HPX-Kokkos to also fully support Kokkos' SYCL
execution space. We show two ways to implement this HPX-SYCL integration and
test them using Octo-Tiger and its Kokkos kernels, on both an NVIDIA A100 and
an AMD MI100. We find modest, yet noticeable, speedups by enabling this
integration, even when just running simple single-node scenarios with
Octo-Tiger where communication and CPU utilization are not yet an issue
The Scalability-Efficiency/Maintainability-Portability Trade-off in Simulation Software Engineering: Examples and a Preliminary Systematic Literature Review
Large-scale simulations play a central role in science and the industry.
Several challenges occur when building simulation software, because simulations
require complex software developed in a dynamic construction process. That is
why simulation software engineering (SSE) is emerging lately as a research
focus. The dichotomous trade-off between scalability and efficiency (SE) on the
one hand and maintainability and portability (MP) on the other hand is one of
the core challenges. We report on the SE/MP trade-off in the context of an
ongoing systematic literature review (SLR). After characterizing the issue of
the SE/MP trade-off using two examples from our own research, we (1) review the
33 identified articles that assess the trade-off, (2) summarize the proposed
solutions for the trade-off, and (3) discuss the findings for SSE and future
work. Overall, we see evidence for the SE/MP trade-off and first solution
approaches. However, a strong empirical foundation has yet to be established;
general quantitative metrics and methods supporting software developers in
addressing the trade-off have to be developed. We foresee considerable future
work in SSE across scientific communities.Comment: 9 pages, 2 figures. Accepted for presentation at the Fourth
International Workshop on Software Engineering for High Performance Computing
in Computational Science and Engineering (SEHPCCSE 2016
Comparison of data-driven uncertainty quantification methods for a carbon dioxide storage benchmark scenario
A variety of methods is available to quantify uncertainties arising with\-in
the modeling of flow and transport in carbon dioxide storage, but there is a
lack of thorough comparisons. Usually, raw data from such storage sites can
hardly be described by theoretical statistical distributions since only very
limited data is available. Hence, exact information on distribution shapes for
all uncertain parameters is very rare in realistic applications. We discuss and
compare four different methods tested for data-driven uncertainty
quantification based on a benchmark scenario of carbon dioxide storage. In the
benchmark, for which we provide data and code, carbon dioxide is injected into
a saline aquifer modeled by the nonlinear capillarity-free fractional flow
formulation for two incompressible fluid phases, namely carbon dioxide and
brine. To cover different aspects of uncertainty quantification, we incorporate
various sources of uncertainty such as uncertainty of boundary conditions, of
conceptual model definitions and of material properties. We consider recent
versions of the following non-intrusive and intrusive uncertainty
quantification methods: arbitary polynomial chaos, spatially adaptive sparse
grids, kernel-based greedy interpolation and hybrid stochastic Galerkin. The
performance of each approach is demonstrated assessing expectation value and
standard deviation of the carbon dioxide saturation against a reference
statistic based on Monte Carlo sampling. We compare the convergence of all
methods reporting on accuracy with respect to the number of model runs and
resolution. Finally we offer suggestions about the methods' advantages and
disadvantages that can guide the modeler for uncertainty quantification in
carbon dioxide storage and beyond
Evaluation of Pool-based Testing Approaches to Enable Population-wide Screening for COVID-19
Background: Rapid testing for an infection is paramount during a pandemic to
prevent continued viral spread and excess morbidity and mortality. This study
aimed to determine whether alternative testing strategies based on sample
pooling can increase the speed and throughput of screening for SARS-CoV-2.
Methods: A mathematical modelling approach was chosen to simulate six
different testing strategies based on key input parameters (infection rate,
test characteristics, population size, testing capacity etc.). The situations
in five countries (US, DE, UK, IT and SG) currently experiencing COVID-19
outbreaks were simulated to reflect a broad variety of population sizes and
testing capacities. The primary study outcome measurements that were finalised
prior to any data collection were time and number of tests required; number of
cases identified; and number of false positives.
Findings: The performance of all tested methods depends on the input
parameters, i.e. the specific circumstances of a screening campaign. To screen
one tenth of each country's population at an infection rate of 1% - e.g. when
prioritising frontline medical staff and public workers -, realistic optimised
testing strategies enable such a campaign to be completed in ca. 29 days in the
US, 71 in the UK, 25 in Singapore, 17 in Italy and 10 in Germany (ca. eight
times faster compared to individual testing). When infection rates are
considerably lower, or when employing an optimal, yet logistically more complex
pooling method, the gains are more pronounced. Pool-based approaches also
reduces the number of false positive diagnoses by 50%.
Interpretation: The results of this study provide a clear rationale for
adoption of pool-based testing strategies to increase speed and throughput of
testing for SARS-CoV-2. The current individual testing approach unnecessarily
wastes valuable time and resources.Comment: Revision; 16 pages, 3 figures, 2 tables, 2 supplementary figure
PDEBENCH: An Extensive Benchmark for Scientific Machine Learning
Machine learning-based modeling of physical systems has experienced increased
interest in recent years. Despite some impressive progress, there is still a
lack of benchmarks for Scientific ML that are easy to use but still challenging
and representative of a wide range of problems. We introduce PDEBench, a
benchmark suite of time-dependent simulation tasks based on Partial
Differential Equations (PDEs). PDEBench comprises both code and data to
benchmark the performance of novel machine learning models against both
classical numerical simulations and machine learning baselines. Our proposed
set of benchmark problems contribute the following unique features: (1) A much
wider range of PDEs compared to existing benchmarks, ranging from relatively
common examples to more realistic and difficult problems; (2) much larger
ready-to-use datasets compared to prior work, comprising multiple simulation
runs across a larger number of initial and boundary conditions and PDE
parameters; (3) more extensible source codes with user-friendly APIs for data
generation and baseline results with popular machine learning models (FNO,
U-Net, PINN, Gradient-Based Inverse Method). PDEBench allows researchers to
extend the benchmark freely for their own purposes using a standardized API and
to compare the performance of new models to existing baseline methods. We also
propose new evaluation metrics with the aim to provide a more holistic
understanding of learning methods in the context of Scientific ML. With those
metrics we identify tasks which are challenging for recent ML methods and
propose these tasks as future challenges for the community. The code is
available at https://github.com/pdebench/PDEBench.Comment: 16 pages (main body) + 34 pages (supplemental material), accepted for
publication in NeurIPS 2022 Track Datasets and Benchmark
Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku
The increasing availability of machines relying on non-GPU architectures,
such as ARM A64FX in high-performance computing, provides a set of interesting
challenges to application developers. In addition to requiring code portability
across different parallelization schemes, programs targeting these
architectures have to be highly adaptable in terms of compute kernel sizes to
accommodate different execution characteristics for various heterogeneous
workloads. In this paper, we demonstrate an approach to code and performance
portability that is based entirely on established standards in the industry. In
addition to applying Kokkos as an abstraction over the execution of compute
kernels on different heterogeneous execution environments, we show that the use
of standard C++ constructs as exposed by the HPX runtime system enables superb
portability in terms of code and performance based on the real-world Octo-Tiger
astrophysics application. We report our experience with porting Octo-Tiger to
the ARM A64FX architecture provided by Stony Brook's Ookami and Riken's
Supercomputer Fugaku and compare the resulting performance with that achieved
on well established GPU-oriented HPC machines such as ORNL's Summit, NERSC's
Perlmutter and CSCS's Piz Daint systems. Octo-Tiger scaled well on
Supercomputer Fugaku without any major code changes due to the abstraction
levels provided by HPX and Kokkos. Adding vectorization support for ARM's SVE
to Octo-Tiger was trivial thanks to using standard C+
From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions
We study the simulation of stellar mergers, which requires complex
simulations with high computational demands. We have developed Octo-Tiger, a
finite volume grid-based hydrodynamics simulation code with Adaptive Mesh
Refinement which is unique in conserving both linear and angular momentum to
machine precision. To face the challenge of increasingly complex, diverse, and
heterogeneous HPC systems, Octo-Tiger relies on high-level programming
abstractions.
We use HPX with its futurization capabilities to ensure scalability both
between nodes and within, and present first results replacing MPI with
libfabric achieving up to a 2.8x speedup. We extend Octo-Tiger to heterogeneous
GPU-accelerated supercomputers, demonstrating node-level performance and
portability. We show scalability up to full system runs on Piz Daint. For the
scenario's maximum resolution, the compute-critical parts (hydrodynamics and
gravity) achieve 68.1% parallel efficiency at 2048 nodes.Comment: Accepted at SC1