100 research outputs found
Optimization of Finite-Differencing Kernels for Numerical Relativity Applications
A simple optimization strategy for the computation of 3D finite-differencing kernels on many-cores architectures is proposed. The 3D finite-differencing computation is split direction-by-direction and exploits two level of parallelism: in-core vectorization and multi-threads shared-memory parallelization. The main application of this method is to accelerate the high-order stencil computations in numerical relativity codes. Our proposed method provides substantial speedup in computations involving tensor contractions and 3D stencil calculations on different processor microarchitectures, including Intel Knight Landing
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
Kranc: a Mathematica application to generate numerical codes for tensorial evolution equations
We present a suite of Mathematica-based computer-algebra packages, termed
"Kranc", which comprise a toolbox to convert (tensorial) systems of partial
differential evolution equations to parallelized C or Fortran code. Kranc can
be used as a "rapid prototyping" system for physicists or mathematicians
handling very complicated systems of partial differential equations, but
through integration into the Cactus computational toolkit we can also produce
efficient parallelized production codes. Our work is motivated by the field of
numerical relativity, where Kranc is used as a research tool by the authors. In
this paper we describe the design and implementation of both the Mathematica
packages and the resulting code, we discuss some example applications, and
provide results on the performance of an example numerical code for the
Einstein equations.Comment: 24 pages, 1 figure. Corresponds to journal versio
Spatial support vector regression to detect silent errors in the exascale era
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs) or silent errors are one of the major sources that corrupt the executionresults of HPC applications without being detected. In this work, we explore a low-memory-overhead SDC detector, by leveraging epsilon-insensitive support vector machine regression, to detect SDCs that occur in HPC applications that can be characterized by an impact error bound. The key contributions are three fold. (1) Our design takes spatialfeatures (i.e., neighbouring data values for each data point in a snapshot) into training data, such that little memory overhead (less than 1%) is introduced. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show thatour detector can achieve the detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% of false positive rate for most cases. Our detector incurs low performance overhead, 5% on average, for all benchmarks studied in the paper. Compared with other state-of-the-art techniques, our detector exhibits the best tradeoff considering the detection ability and overheads.This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing
Research Program, under Contract DE-AC02-06CH11357, by FI-DGR 2013 scholarship, by HiPEAC PhD Collaboration
Grant, the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402, and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
A pseudospectral matrix method for time-dependent tensor fields on a spherical shell
We construct a pseudospectral method for the solution of time-dependent,
non-linear partial differential equations on a three-dimensional spherical
shell. The problem we address is the treatment of tensor fields on the sphere.
As a test case we consider the evolution of a single black hole in numerical
general relativity. A natural strategy would be the expansion in tensor
spherical harmonics in spherical coordinates. Instead, we consider the simpler
and potentially more efficient possibility of a double Fourier expansion on the
sphere for tensors in Cartesian coordinates. As usual for the double Fourier
method, we employ a filter to address time-step limitations and certain
stability issues. We find that a tensor filter based on spin-weighted spherical
harmonics is successful, while two simplified, non-spin-weighted filters do not
lead to stable evolutions. The derivatives and the filter are implemented by
matrix multiplication for efficiency. A key technical point is the construction
of a matrix multiplication method for the spin-weighted spherical harmonic
filter. As example for the efficient parallelization of the double Fourier,
spin-weighted filter method we discuss an implementation on a GPU, which
achieves a speed-up of up to a factor of 20 compared to a single core CPU
implementation.Comment: 33 pages, 9 figure
The Coyote Universe III: Simulation Suite and Precision Emulator for the Nonlinear Matter Power Spectrum
Many of the most exciting questions in astrophysics and cosmology, including
the majority of observational probes of dark energy, rely on an understanding
of the nonlinear regime of structure formation. In order to fully exploit the
information available from this regime and to extract cosmological constraints,
accurate theoretical predictions are needed. Currently such predictions can
only be obtained from costly, precision numerical simulations. This paper is
the third in a series aimed at constructing an accurate calibration of the
nonlinear mass power spectrum on Mpc scales for a wide range of currently
viable cosmological models, including dark energy. The first two papers
addressed the numerical challenges, and the scheme by which an interpolator was
built from a carefully chosen set of cosmological models. In this paper we
introduce the "Coyote Univers"' simulation suite which comprises nearly 1,000
N-body simulations at different force and mass resolutions, spanning 38 wCDM
cosmologies. This large simulation suite enables us to construct a prediction
scheme, or emulator, for the nonlinear matter power spectrum accurate at the
percent level out to k~1 h/Mpc. We describe the construction of the emulator,
explain the tests performed to ensure its accuracy, and discuss how the central
ideas may be extended to a wider range of cosmological models and applications.
A power spectrum emulator code is released publicly as part of this paper.Comment: 10 pages, 10 figures, minor changes to address referee report,
version v1.1 of the power spectrum emulator code can be downloaded at
http://www.hep.anl.gov/cosmology/CosmicEmu/emu.html, includes now fortran
wrapper and choice of any redshift between z=0 and z=1 (note: webpage now
maintained at Argonne National Laboratory
Exploring the capabilities of support vector machines in detecting silent data corruptions
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs), or silent errors, are one of the major sources that corrupt the execution results of HPC applications without being detected.
In this work, we explore a set of novel SDC detectors – by leveraging epsilon-insensitive support vector machine regression – to detect SDCs that occur in HPC applications. The key contributions are threefold. (1) Our exploration takes temporal, spatial, and spatiotemporal features into account and analyzes different detectors based on different features. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show that support-vector-machine-based detectors can achieve detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% false positive rate for most cases. Our detectors incur low performance overhead, 5% on average, for all benchmarks studied in this work.This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number 66905, program manager Lucy Nowell. Pacific Northwest National Laboratory is operated by Battelle for DOE under Contract DE-AC05-76RL01830. In addition, this material is based upon work supported by the National Science Foundation under Grant No. 1619253, and also by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, program manager Lucy Nowell, under contract number DE-AC02-06CH11357 (DOE Catalog project) and in part by the European Union FEDER funds under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
- …