Search CORE

3,364 research outputs found

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

Hardware acceleration of reaction-diffusion systems:a guide to optimisation of pattern formation algorithms using OpenACC

Author: Falconer Ruth E.
Houston Alasdair N.
Otten Wilfred
Portell Xavier
Publication venue
Publication date: 10/06/2019
Field of study

Reaction Diffusion Systems (RDS) have widespread applications in computational ecology, biology, computer graphics and the visual arts. For the former applications a major barrier to the development of effective simulation models is their computational complexity - it takes a great deal of processing power to simulate enough replicates such that reliable conclusions can be drawn. Optimizing the computation is thus highly desirable in order to obtain more results with less resources. Existing optimizations of RDS tend to be low-level and GPGPU based. Here we apply the higher-level OpenACC framework to two case studies: a simple RDS to learn the ‘workings’ of OpenACC and a more realistic and complex example. Our results show that simple parallelization directives and minimal data transfer can produce a useful performance improvement. The relative simplicity of porting OpenACC code between heterogeneous hardware is a key benefit to the scientific computing community in terms of speed-up and portability

Abertay Research Portal

Crossref

Parallelization and integration of fire simulations in the Uintah PSE

Author: Parker Steven G.
Rawat Rajesh
Publication venue: Society for Industrial and Applied Mathematics (SIAM)
Publication date: 01/01/2001
Field of study

Journal ArticleA physics-based stand-alone serial code for fire simulations is integrated in a unified computational framework to couple with other disciplines and to achieve massively parallel computation. Uintah, the computational framework used, is a component-based visual problem-solving environment developed at the University of Utah. It provides the framework for large-scale parallelization for different applications. The integration of the legacy fire code in Uintah is built on three principles: 1)Develop different reusable physics-based components that can be used interchangeably and interact with other components, 2) reuse the legacy stand-alone fire code (written in Fortran) as much as possible, and 3) use components developed by third parties, specifically non-linear and linear solvers designed for solving complex-flow problems. A helium buoyant plume is simulated using the Nirvana machine at Los Alamos National Laboratory. Linear scalability is achieved up to 128 processors. Issues related to scaling beyond 128 processors are also discussed

The University of Utah: J. Willard Marriott Digital Library

Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale

Author: Cabezón Rubén M.
Cavelan Aurélien
Ciorba Florina M.
Guerrera Danilo
Imbert David
Mayer Lucio
Piccinali Jean-Guillaume
Reed Darren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method, used in numerical simulations of fluids in astrophysics and computational fluid dynamics, among many other fields. SPH simulations with detailed physics represent computationally-demanding calculations. The parallelization of SPH codes is not trivial due to the absence of a structured grid. Additionally, the performance of the SPH codes can be, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. This work presents insights into the current performance and functionalities of three SPH codes: SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. To gain such insights, a rotating square patch test was implemented as a common test simulation for the three SPH codes and analyzed on two modern HPC systems. Furthermore, to stress the differences with the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an additional test case, the Evrard collapse, has also been carried out. This work extrapolates the common basic SPH features in the three codes for the purpose of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app. Moreover, the outcome of this serves as direct feedback to the parent codes, to improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on Cluster Computing proceedings for WRAp1

arXiv.org e-Print Archive

Crossref

edoc

ZORA