2,154 research outputs found
86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy
We present the GPU version of DeePMD-kit, which, upon training a deep neural
network model using ab initio data, can drive extremely large-scale molecular
dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU
version is 7 times faster than the CPU version with the same power consumption.
The code can scale up to the entire Summit supercomputer. For a copper system
of 113, 246, 208 atoms, the code can perform one nanosecond MD simulation per
day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such
unprecedented ability to perform MD simulation with ab initio accuracy opens up
the possibility of studying many important issues in materials and molecules,
such as heterogeneous catalysis, electrochemical cells, irradiation damage,
crack propagation, and biochemical reactions.Comment: 29 pages, 11 figure
More Bang for Your Buck: Improved use of GPU Nodes for GROMACS 2018
We identify hardware that is optimal to produce molecular dynamics
trajectories on Linux compute clusters with the GROMACS 2018 simulation
package. Therefore, we benchmark the GROMACS performance on a diverse set of
compute nodes and relate it to the costs of the nodes, which may include their
lifetime costs for energy and cooling. In agreement with our earlier
investigation using GROMACS 4.6 on hardware of 2014, the performance to price
ratio of consumer GPU nodes is considerably higher than that of CPU nodes.
However, with GROMACS 2018, the optimal CPU to GPU processing power balance has
shifted even more towards the GPU. Hence, nodes optimized for GROMACS 2018 and
later versions enable a significantly higher performance to price ratio than
nodes optimized for older GROMACS versions. Moreover, the shift towards GPU
processing allows to cheaply upgrade old nodes with recent GPUs, yielding
essentially the same performance as comparable brand-new hardware.Comment: 41 pages, 13 figures, 4 tables. This updated version includes the
following improvements: - most notably, added benchmarks for two coarse grain
MARTINI systems VES and BIG, resulting in a new Figure 13 - fixed typos -
made text clearer in some places - added two more benchmarks for MEM and RIB
systems (E3-1240v6 + RTX 2080 / 2080Ti
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
The molecular dynamics simulation package GROMACS runs efficiently on a wide
variety of hardware from commodity workstations to high performance computing
clusters. Hardware features are well exploited with a combination of SIMD,
multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as
accelerators to compute interactions offloaded from the CPU. Here we evaluate
which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most
economical way. We have assembled and benchmarked compute nodes with various
CPU/GPU combinations to identify optimal compositions in terms of raw
trajectory production rate, performance-to-price ratio, energy efficiency, and
several other criteria. Though hardware prices are naturally subject to trends
and fluctuations, general tendencies are clearly visible. Adding any type of
GPU significantly boosts a node's simulation performance. For inexpensive
consumer-class GPUs this improvement equally reflects in the
performance-to-price ratio. Although memory issues in consumer-class GPUs could
pass unnoticed since these cards do not support ECC memory, unreliable GPUs can
be sorted out with memory checking tools. Apart from the obvious determinants
for cost-efficiency like hardware expenses and raw performance, the energy
consumption of a node is a major cost factor. Over the typical hardware
lifetime until replacement of a few years, the costs for electrical power and
cooling can become larger than the costs of the hardware itself. Taking that
into account, nodes with a well-balanced ratio of CPU and consumer-class GPU
resources produce the maximum amount of GROMACS trajectory over their lifetime
Multi-Architecture Monte-Carlo (MC) Simulation of Soft Coarse-Grained Polymeric Materials: SOft coarse grained Monte-carlo Acceleration (SOMA)
Multi-component polymer systems are important for the development of new
materials because of their ability to phase-separate or self-assemble into
nano-structures. The Single-Chain-in-Mean-Field (SCMF) algorithm in conjunction
with a soft, coarse-grained polymer model is an established technique to
investigate these soft-matter systems. Here we present an im- plementation of
this method: SOft coarse grained Monte-carlo Accelera- tion (SOMA). It is
suitable to simulate large system sizes with up to billions of particles, yet
versatile enough to study properties of different kinds of molecular
architectures and interactions. We achieve efficiency of the simulations
commissioning accelerators like GPUs on both workstations as well as
supercomputers. The implementa- tion remains flexible and maintainable because
of the implementation of the scientific programming language enhanced by
OpenACC pragmas for the accelerators. We present implementation details and
features of the program package, investigate the scalability of our
implementation SOMA, and discuss two applications, which cover system sizes
that are difficult to reach with other, common particle-based simulation
methods
Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters
Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version
- …