105 research outputs found
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Recommended from our members
Fast algorithms for biophysically-constrained inverse problems in medical imaging
We present algorithms and software for parameter estimation for forward and inverse tumor growth problems and diffeomorphic image registration. Our methods target the following scenarios: automatic image registration of healthy images to tumor bearing medical images and parameter estimation/calibration of tumor models. This thesis focuses on robust and scalable algorithms for these problems.
Although the proposed framework applies to many problems in oncology, we focus on primary brain tumors and in particular low and high-grade gliomas. For the tumor model, the main quantity of interest is the extent of tumor infiltration into the brain, beyond what is visible in imaging.
The inverse tumor problem assumes that we have patient images at two (or more) well-separated times so that we can observe the tumor growth. Also, the inverse problem requires that the two images are segmented. But in a clinical setting such information is usually not available. In a typical case, we just have multimodal magnetic resonance images with no segmentation. We address this lack of information by solving a coupled inverse registration and tumor problem. The role of image registration is to find a plausible mapping between the patient's
tumor-bearing image and a normal brain (atlas), with known segmentation. Solving this coupled inverse problem has a prohibitive computational cost, especially in 3D. To address this challenge we have developed novel schemes, scaled up to 200K cores.
Our main contributions is the design and implementation of fast solvers for these problems. We also study the performance for the tumor parameter estimation and registration solvers and their algorithmic scalability. In particular, we introduce the following novel algorithms: An adjoint formulation for tumor-growth problems with/without mass-effect; The first parallel 3D Newton-Krylov method for large diffeomorphic image registration; A novel parallel semi-Lagrangian algorithm for solving advection equations in image registration and its parallel implementation on shared and distributed memory architectures; and Accelerated FFT (AccFFT), an open-source parallel FFT library for CPU and GPUs scaled up to 131,000 cores with optimized kernels for computing spectral operators.
The scientific outcomes of this thesis, has appeared in the proceedings of three ACM/IEEE SCxy conferences (two best student paper finalist, and one ACM SRC gold medal), two journal papers, two papers in review, four papers in preparation (coupling, mass effect, segmentation, and multi-species tumor model), and seven conference presentations.Computational Science, Engineering, and Mathematic
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
GPU fast multipole method with lambda-dynamics features
A significant and computationally most demanding part of molecular dynamics simulations is the calculation of long-range electrostatic interactions. Such interactions can be evaluated directly by the naĂŻve pairwise summation algorithm, which is a ubiquitous showcase example for the compute power of graphics processing units (GPUS). However, the pairwise summation has O(N^2) computational complexity for N interacting particles; thus, an approximation method with a better scaling is required. Today, the prevalent method for such approximation in the field is particle mesh Ewald (PME). PME takes advantage of fast Fourier transforms (FFTS) to approximate the solution efficiently. However, as the underlying FFTS require all-to-all communication between ranks, PME runs into a communication bottleneck. Such communication overhead is negligible only for a moderate parallelization. With increased parallelization, as needed for high-performance applications, the usage of PME becomes unprofitable. Another PME drawback is its inability to perform constant pH simulations efficiently. In such simulations, the protonation states of a protein are allowed to change dynamically during the simulation. The description of this process requires a separate evaluation of the energies for each protonation state. This can not be calculated efficiently with PME as the algorithm requires a repeated FFT for each state, which leads to a linear overhead with respect to the number of states. For a fast approximation of pairwise Coulombic interactions, which does not suffer from PME drawbacks, the Fast Multipole Method (FMM) has been implemented and fully parallelized with CUDA. To assure the optimal FMM performance for diverse MD systems multiple parallelization strategies have been developed. The algorithm has been efficiently incorporated into GROMACS and subsequently tested to determine the optimal FMM parameter set for MD simulations. Finally, the FMM has been incorporated into GROMACS to allow for out-of-the-box electrostatic calculations. The performance of the single-GPU FMM implementation, tested in GROMACS 2019, achieves about a third of highly optimized CUDA PME performance when simulating systems with uniform particle distributions. However, the FMM is expected to outperform PME at high parallelization because the FMM global communication overhead is minimal compared to that of PME. Further, the FMM has been enhanced to provide the energies of an arbitrary number of titratable sites as needed in the constant-pH method. The extension is not fully optimized yet, but the first results show the strength of the FMM for constant pH simulations. For a relatively large system with half a million particles and more than a hundred titratable sites, a straightforward approach to compute alternative energies requires the repetition of a simulation for each state of the sites. The FMM calculates all energy terms only a factor 1.5 slower than a single simulation step. Further improvements of the GPU implementation are expected to yield even more speedup compared to the actual implementation.2021-11-1
N-body simulations of gravitational dynamics
We describe the astrophysical and numerical basis of N-body simulations, both
of collisional stellar systems (dense star clusters and galactic centres) and
collisionless stellar dynamics (galaxies and large-scale structure). We explain
and discuss the state-of-the-art algorithms used for these quite different
regimes, attempt to give a fair critique, and point out possible directions of
future improvement and development. We briefly touch upon the history of N-body
simulations and their most important results.Comment: invited review (28 pages), to appear in European Physics Journal Plu
- …