15,396 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
High performance Python for direct numerical simulations of turbulent flows
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an
invaluable research tool in fluid dynamics. Still, there are few publicly
available research codes and, due to the heavy number crunching implied,
available codes are usually written in low-level languages such as C/C++ or
Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS
code that nearly matches the performance of C++ for thousands of processors and
billions of unknowns. We also describe a version optimized through Cython, that
is found to match the speed of C++. The solvers are written from scratch in
Python, both the mesh, the MPI domain decomposition, and the temporal
integrators. The solvers have been verified and benchmarked on the Shaheen
supercomputer at the KAUST supercomputing laboratory, and we are able to show
very good scaling up to several thousand cores.
A very important part of the implementation is the mesh decomposition (we
implement both slab and pencil decompositions) and 3D parallel Fast Fourier
Transforms (FFT). The mesh decomposition and FFT routines have been implemented
in Python using serial FFT routines (either NumPy, pyFFTW or any other serial
FFT module), NumPy array manipulations and with MPI communications handled by
MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT
in Python for a slab mesh decomposition using 4 lines of compact Python code,
for which the parallel performance on Shaheen is found to be slightly better
than similar routines provided through the FFTW library. For a pencil mesh
decomposition 7 lines of code is required to execute a transform
- …