991 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping
FLUSEPA (Registered trademark in France No. 134009261) is an advanced
simulation tool which performs a large panel of aerodynamic studies. It is the
unstructured finite-volume solver developed by Airbus Safran Launchers company
to calculate compressible, multidimensional, unsteady, viscous and reactive
flows around bodies in relative motion. The time integration in FLUSEPA is done
using an explicit temporal adaptive method. The current production version of
the code is based on MPI and OpenMP. This implementation leads to important
synchronizations that must be reduced. To tackle this problem, we present the
study of a task-based parallelization of the aerodynamic solver of FLUSEPA
using the runtime system StarPU and combining up to three levels of
parallelism. We validate our solution by the simulation (using a finite-volume
mesh with 80 million cells) of a take-off blast wave propagation for Ariane 5
launcher.Comment: Accepted manuscript of a paper in Journal of Computational Scienc
Hierarchical Parallelisation of Functional Renormalisation Group Calculations -- hp-fRG
The functional renormalisation group (fRG) has evolved into a versatile tool
in condensed matter theory for studying important aspects of correlated
electron systems. Practical applications of the method often involve a high
numerical effort, motivating the question in how far High Performance Computing
(HPC) can leverage the approach. In this work we report on a multi-level
parallelisation of the underlying computational machinery and show that this
can speed up the code by several orders of magnitude. This in turn can extend
the applicability of the method to otherwise inaccessible cases. We exploit
three levels of parallelisation: Distributed computing by means of Message
Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means
of SIMD units (single-instruction-multiple-data). Results are provided for two
distinct High Performance Computing (HPC) platforms, namely the IBM-based
BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster.
We discuss how certain issues and obstacles were overcome in the course of
adapting the code. Most importantly, we conclude that this vast improvement can
actually be accomplished by introducing only moderate changes to the code, such
that this strategy may serve as a guideline for other researcher to likewise
improve the efficiency of their codes
A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
A hybrid scheme that utilizes MPI for distributed memory parallelism and
OpenMP for shared memory parallelism is presented. The work is motivated by the
desire to achieve exceptionally high Reynolds numbers in pseudospectral
computations of fluid turbulence on emerging petascale, high core-count,
massively parallel processing systems. The hybrid implementation derives from
and augments a well-tested scalable MPI-parallelized pseudospectral code. The
hybrid paradigm leads to a new picture for the domain decomposition of the
pseudospectral grids, which is helpful in understanding, among other things,
the 3D transpose of the global data that is necessary for the parallel fast
Fourier transforms that are the central component of the numerical
discretizations. Details of the hybrid implementation are provided, and
performance tests illustrate the utility of the method. It is shown that the
hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a
maximum mean efficiency of 83%. Data are presented that demonstrate how to
choose the optimal number of MPI processes and OpenMP threads in order to
optimize code performance on two different platforms.Comment: Submitted to Parallel Computin
LFRic: meeting the challenges of scalability and performance portability in weather and climate models
This paper describes LFRic: the new weather and climate modelling
system being developed by the UK Met Office to replace the existing
Unified Model in preparation for exascale computing in the 2020s.
LFRic uses the GungHo dynamical core and runs on a semi-structured
cubed-sphere mesh. The design of the supporting infrastructure follows
object-oriented principles to facilitate modularity and the use of
external libraries where possible. In particular, a `separation of concerns'
between the science code and parallel code is imposed to promote
performance portability. An application called PSyclone, developed at the
STFC Hartree centre, can generate the parallel code enabling deployment of
a single source science code onto different machine architectures.
This paper provides an overview of the scientific requirement, the design
of the software infrastructure, and examples of PSyclone usage. Preliminary
performance results show strong scaling and an indication that hybrid
MPI/OpenMP performs better than pure MPI
- …