1,396 research outputs found

    More Bang for Your Buck: Improved use of GPU Nodes for GROMACS 2018

    Get PDF
    We identify hardware that is optimal to produce molecular dynamics trajectories on Linux compute clusters with the GROMACS 2018 simulation package. Therefore, we benchmark the GROMACS performance on a diverse set of compute nodes and relate it to the costs of the nodes, which may include their lifetime costs for energy and cooling. In agreement with our earlier investigation using GROMACS 4.6 on hardware of 2014, the performance to price ratio of consumer GPU nodes is considerably higher than that of CPU nodes. However, with GROMACS 2018, the optimal CPU to GPU processing power balance has shifted even more towards the GPU. Hence, nodes optimized for GROMACS 2018 and later versions enable a significantly higher performance to price ratio than nodes optimized for older GROMACS versions. Moreover, the shift towards GPU processing allows to cheaply upgrade old nodes with recent GPUs, yielding essentially the same performance as comparable brand-new hardware.Comment: 41 pages, 13 figures, 4 tables. This updated version includes the following improvements: - most notably, added benchmarks for two coarse grain MARTINI systems VES and BIG, resulting in a new Figure 13 - fixed typos - made text clearer in some places - added two more benchmarks for MEM and RIB systems (E3-1240v6 + RTX 2080 / 2080Ti

    GPU fast multipole method with lambda-dynamics features

    Get PDF
    A significant and computationally most demanding part of molecular dynamics simulations is the calculation of long-range electrostatic interactions. Such interactions can be evaluated directly by the naĂŻve pairwise summation algorithm, which is a ubiquitous showcase example for the compute power of graphics processing units (GPUS). However, the pairwise summation has O(N^2) computational complexity for N interacting particles; thus, an approximation method with a better scaling is required. Today, the prevalent method for such approximation in the field is particle mesh Ewald (PME). PME takes advantage of fast Fourier transforms (FFTS) to approximate the solution efficiently. However, as the underlying FFTS require all-to-all communication between ranks, PME runs into a communication bottleneck. Such communication overhead is negligible only for a moderate parallelization. With increased parallelization, as needed for high-performance applications, the usage of PME becomes unprofitable. Another PME drawback is its inability to perform constant pH simulations efficiently. In such simulations, the protonation states of a protein are allowed to change dynamically during the simulation. The description of this process requires a separate evaluation of the energies for each protonation state. This can not be calculated efficiently with PME as the algorithm requires a repeated FFT for each state, which leads to a linear overhead with respect to the number of states. For a fast approximation of pairwise Coulombic interactions, which does not suffer from PME drawbacks, the Fast Multipole Method (FMM) has been implemented and fully parallelized with CUDA. To assure the optimal FMM performance for diverse MD systems multiple parallelization strategies have been developed. The algorithm has been efficiently incorporated into GROMACS and subsequently tested to determine the optimal FMM parameter set for MD simulations. Finally, the FMM has been incorporated into GROMACS to allow for out-of-the-box electrostatic calculations. The performance of the single-GPU FMM implementation, tested in GROMACS 2019, achieves about a third of highly optimized CUDA PME performance when simulating systems with uniform particle distributions. However, the FMM is expected to outperform PME at high parallelization because the FMM global communication overhead is minimal compared to that of PME. Further, the FMM has been enhanced to provide the energies of an arbitrary number of titratable sites as needed in the constant-pH method. The extension is not fully optimized yet, but the first results show the strength of the FMM for constant pH simulations. For a relatively large system with half a million particles and more than a hundred titratable sites, a straightforward approach to compute alternative energies requires the repetition of a simulation for each state of the sites. The FMM calculates all energy terms only a factor 1.5 slower than a single simulation step. Further improvements of the GPU implementation are expected to yield even more speedup compared to the actual implementation.2021-11-1

    The ReaxFF reactive force-field : development, applications and future directions

    Get PDF
    The reactive force-field (ReaxFF) interatomic potential is a powerful computational tool for exploring, developing and optimizing material properties. Methods based on the principles of quantum mechanics (QM), while offering valuable theoretical guidance at the electronic level, are often too computationally intense for simulations that consider the full dynamic evolution of a system. Alternatively, empirical interatomic potentials that are based on classical principles require significantly fewer computational resources, which enables simulations to better describe dynamic processes over longer timeframes and on larger scales. Such methods, however, typically require a predefined connectivity between atoms, precluding simulations that involve reactive events. The ReaxFF method was developed to help bridge this gap. Approaching the gap from the classical side, ReaxFF casts the empirical interatomic potential within a bond-order formalism, thus implicitly describing chemical bonding without expensive QM calculations. This article provides an overview of the development, application, and future directions of the ReaxFF method

    Multi-Architecture Monte-Carlo (MC) Simulation of Soft Coarse-Grained Polymeric Materials: SOft coarse grained Monte-carlo Acceleration (SOMA)

    Full text link
    Multi-component polymer systems are important for the development of new materials because of their ability to phase-separate or self-assemble into nano-structures. The Single-Chain-in-Mean-Field (SCMF) algorithm in conjunction with a soft, coarse-grained polymer model is an established technique to investigate these soft-matter systems. Here we present an im- plementation of this method: SOft coarse grained Monte-carlo Accelera- tion (SOMA). It is suitable to simulate large system sizes with up to billions of particles, yet versatile enough to study properties of different kinds of molecular architectures and interactions. We achieve efficiency of the simulations commissioning accelerators like GPUs on both workstations as well as supercomputers. The implementa- tion remains flexible and maintainable because of the implementation of the scientific programming language enhanced by OpenACC pragmas for the accelerators. We present implementation details and features of the program package, investigate the scalability of our implementation SOMA, and discuss two applications, which cover system sizes that are difficult to reach with other, common particle-based simulation methods

    Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

    Full text link
    The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well exploited with a combination of SIMD, multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as accelerators to compute interactions offloaded from the CPU. Here we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Though hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed since these cards do not support ECC memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime

    Strong scaling of general-purpose molecular dynamics simulations on GPUs

    Get PDF
    We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5x.Comment: 30 pages, 14 figure
    • …
    corecore