1,626 research outputs found
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
DISPATCH: A Numerical Simulation Framework for the Exa-scale Era. I. Fundamentals
We introduce a high-performance simulation framework that permits the
semi-independent, task-based solution of sets of partial differential
equations, typically manifesting as updates to a collection of `patches' in
space-time. A hybrid MPI/OpenMP execution model is adopted, where work tasks
are controlled by a rank-local `dispatcher' which selects, from a set of tasks
generally much larger than the number of physical cores (or hardware threads),
tasks that are ready for updating. The definition of a task can vary, for
example, with some solving the equations of ideal magnetohydrodynamics (MHD),
others non-ideal MHD, radiative transfer, or particle motion, and yet others
applying particle-in-cell (PIC) methods. Tasks do not have to be grid-based,
while tasks that are, may use either Cartesian or orthogonal curvilinear
meshes. Patches may be stationary or moving. Mesh refinement can be static or
dynamic. A feature of decisive importance for the overall performance of the
framework is that time steps are determined and applied locally; this allows
potentially large reductions in the total number of updates required in cases
when the signal speed varies greatly across the computational domain, and
therefore a corresponding reduction in computing time. Another feature is a
load balancing algorithm that operates `locally' and aims to simultaneously
minimise load and communication imbalance. The framework generally relies on
already existing solvers, whose performance is augmented when run under the
framework, due to more efficient cache usage, vectorisation, local
time-stepping, plus near-linear and, in principle, unlimited OpenMP and MPI
scaling.Comment: 17 pages, 8 figures. Accepted by MNRA
Modelling drug coatings: A parallel cellular automata model of ethylcellulose-coated microspheres
Pharmaceutical companies today face a growing demand for more complex drug designs. In the past few decades, a number of probabilistic models have been developed, with the aim of improving insight on microscopic features of these complex designs. Of particular interest are models of controlled release systems, which can provide tools to study targeted dose delivery. Controlled release is achieved by using polymers with different dissolution characteristics. We present here an approach for parallelising a large-scale model of a drug delivery system based on Monte Carlo methods, as a framework for Cellular Automata mobility. The model simulates drug release in the gastro-intestinal tract, from coated ethylcellulose microspheres. The objective is high performance simulation of coated drugs for targeted delivery. The overall aim is to understand the importance of various molecular effects with respect to system evolution over time. Important underlying mechanisms of the process, such as erosion and diffusion, are described
BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images
In cryo-electron microscopy (EM), molecular structures are determined from
large numbers of projection images of individual particles. To harness the full
power of this single-molecule information, we use the Bayesian inference of EM
(BioEM) formalism. By ranking structural models using posterior probabilities
calculated for individual images, BioEM in principle addresses the challenge of
working with highly dynamic or heterogeneous systems not easily handled in
traditional EM reconstruction. However, the calculation of these posteriors for
large numbers of particles and models is computationally demanding. Here we
present highly parallelized, GPU-accelerated computer software that performs
this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI
parallelization combined with both CPU and GPU computing. The resulting BioEM
software scales nearly ideally both on pure CPU and on CPU+GPU architectures,
thus enabling Bayesian analysis of tens of thousands of images in a reasonable
time. The general mathematical framework and robust algorithms are not limited
to cryo-electron microscopy but can be generalized for electron tomography and
other imaging experiments
Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale
The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian
method, used in numerical simulations of fluids in astrophysics and
computational fluid dynamics, among many other fields. SPH simulations with
detailed physics represent computationally-demanding calculations. The
parallelization of SPH codes is not trivial due to the absence of a structured
grid. Additionally, the performance of the SPH codes can be, in general,
adversely impacted by several factors, such as multiple time-stepping,
long-range interactions, and/or boundary conditions. This work presents
insights into the current performance and functionalities of three SPH codes:
SPHYNX, ChaNGa, and SPH-flow. These codes are the starting point of an
interdisciplinary co-design project, SPH-EXA, for the development of an
Exascale-ready SPH mini-app. To gain such insights, a rotating square patch
test was implemented as a common test simulation for the three SPH codes and
analyzed on two modern HPC systems. Furthermore, to stress the differences with
the codes stemming from the astrophysics community (SPHYNX and ChaNGa), an
additional test case, the Evrard collapse, has also been carried out. This work
extrapolates the common basic SPH features in the three codes for the purpose
of consolidating them into a pure-SPH, Exascale-ready, optimized, mini-app.
Moreover, the outcome of this serves as direct feedback to the parent codes, to
improve their performance and overall scalability.Comment: 18 pages, 4 figures, 5 tables, 2018 IEEE International Conference on
Cluster Computing proceedings for WRAp1
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
- âŠ