12 research outputs found

    Scaling of the GROMACS 4.6 molecular dynamics code on SuperMUC.

    No full text
    Here we report on the performance of GROMACS 4.6 on the SuperMUC cluster at the Leibniz Rechenzentrum in Garching. We carried out benchmarks with three biomolecular systems consisting of eighty thousand to twelve million atoms in a strong scaling test each. The twelve million atom simulation system reached a performance of 49 nanoseconds per day on 32,768 cores

    TweTriS: Twenty trillion-atom simulation

    Get PDF
    Significant improvements are presented for the molecular dynamics code ls1 mardyn — a linked cell-based code for simulating a large number of small, rigid molecules with application areas in chemical engineering. The changes consist of a redesign of the SIMD vectorization via wrappers, MPI improvements and a software redesign to allow memory-efficient execution with the production trunk to increase portability and extensibility. Two novel, memory-efficient OpenMP schemes for the linked cell-based force calculation are presented, which are able to retain Newton’s third law optimization. Comparisons to well-optimized Verlet list-based codes, such as LAMMPS and GROMACS, demonstrate the viability of the linked cell-based approach. The present version of ls1 mardyn is used to run simulations on entire supercomputers, maximizing the number of sampled atoms. Compared to the preceding version of ls1 mardyn on the entire set of 9216 nodes of SuperMUC, Phase 1, 27% more atoms are simulated. Weak scaling performance is increased by up to 40% and strong scaling performance by up to more than 220%. On Hazel Hen, strong scaling efficiency of up to 81% and 189 billion molecule updates per second is attained, when scaling from 8 to 7168 nodes. Moreover, a total of 20 trillion atoms is simulated at up to 88% weak scaling efficiency running at up to 1.33 PFLOPS. This represents a fivefold increase in terms of the number of atoms simulated to date.BMBF, 01IH16008, Verbundprojekt: TaLPas - Task-basierte Lastverteilung und Auto-Tuning in der Partikelsimulatio

    Algorithmische und Code-Optimierungen Molekulardynamiksimulationen fĂĽr Verfahrenstechnik

    Get PDF
    The focus of this work lies on implementational improvements and, in particular, node-level performance optimization of the simulation software ls1-mardyn. Through data structure improvements, SIMD vectorization and, especially, OpenMP parallelization, the world’s first simulation of 2*1013 molecules at over 1 PFLOP/sec was enabled. To allow for long-range interactions, the Fast Multipole Method was introduced to ls1-mardyn. The algorithm was optimized for sequential, shared-memory, and distributed-memory execution on up to 32,768 MPI processes.Der Fokus dieser Arbeit liegt auf Code-Optimierungen und insbesondere Leistungsoptimierung auf Knoten-Ebene für die Simulationssoftware ls1-mardyn. Durch verbesserte Datenstrukturen, SIMD-Vektorisierung und vor allem OpenMP-Parallelisierung wurde die weltweit erste Petaflop-Simulation von 2*1013 Molekülen ermöglicht. Zur Simulation von langreichweitigen Wechselwirkungen wurde die Fast-Multipole-Methode in ls1-mardyn eingeführt. Sequenzielle, Shared- und Distributed-Memory-Optimierungen wurden angewandt und erlaubten eine Ausführung auf bis zu 32768 MPI-Prozessen

    GPU fast multipole method with lambda-dynamics features

    Get PDF
    A significant and computationally most demanding part of molecular dynamics simulations is the calculation of long-range electrostatic interactions. Such interactions can be evaluated directly by the naĂŻve pairwise summation algorithm, which is a ubiquitous showcase example for the compute power of graphics processing units (GPUS). However, the pairwise summation has O(N^2) computational complexity for N interacting particles; thus, an approximation method with a better scaling is required. Today, the prevalent method for such approximation in the field is particle mesh Ewald (PME). PME takes advantage of fast Fourier transforms (FFTS) to approximate the solution efficiently. However, as the underlying FFTS require all-to-all communication between ranks, PME runs into a communication bottleneck. Such communication overhead is negligible only for a moderate parallelization. With increased parallelization, as needed for high-performance applications, the usage of PME becomes unprofitable. Another PME drawback is its inability to perform constant pH simulations efficiently. In such simulations, the protonation states of a protein are allowed to change dynamically during the simulation. The description of this process requires a separate evaluation of the energies for each protonation state. This can not be calculated efficiently with PME as the algorithm requires a repeated FFT for each state, which leads to a linear overhead with respect to the number of states. For a fast approximation of pairwise Coulombic interactions, which does not suffer from PME drawbacks, the Fast Multipole Method (FMM) has been implemented and fully parallelized with CUDA. To assure the optimal FMM performance for diverse MD systems multiple parallelization strategies have been developed. The algorithm has been efficiently incorporated into GROMACS and subsequently tested to determine the optimal FMM parameter set for MD simulations. Finally, the FMM has been incorporated into GROMACS to allow for out-of-the-box electrostatic calculations. The performance of the single-GPU FMM implementation, tested in GROMACS 2019, achieves about a third of highly optimized CUDA PME performance when simulating systems with uniform particle distributions. However, the FMM is expected to outperform PME at high parallelization because the FMM global communication overhead is minimal compared to that of PME. Further, the FMM has been enhanced to provide the energies of an arbitrary number of titratable sites as needed in the constant-pH method. The extension is not fully optimized yet, but the first results show the strength of the FMM for constant pH simulations. For a relatively large system with half a million particles and more than a hundred titratable sites, a straightforward approach to compute alternative energies requires the repetition of a simulation for each state of the sites. The FMM calculates all energy terms only a factor 1.5 slower than a single simulation step. Further improvements of the GPU implementation are expected to yield even more speedup compared to the actual implementation.2021-11-1

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Multi-Scale Simulations of Collagen Failure and Mechanoradicals

    Get PDF
    Collagen, the most abundant protein in the human body, must withstand high mechanical loads due to its structural role in tendons, skin, bones, and other connective tissue. It was recently found that tensed collagen creates mechanoradicals by homolytic bond scission in the sub-failure regime. The locations and types of initial rupture sites critically decide on both the mechanical and chemical impact of these micro-ruptures on the tissue, but are yet to be explored. We here employ hybrid scale-bridging simulations to determine these first breakage points in collagen, combining existing and newly developed methods tailored towards collagen’s hierarchical structure. We improved our Kinetic Monte Carlo/Molecular Dynamics scheme to simulate bond scissions at the all-atom level, and also developed a mesoscopic ultra coarse-grained description of a collagen fibril. We find collagen crosslinks to rupture first, and identify individual sacrificial bonds in trivalent crosslinks that break preferentially, without compromising structural integrity. Collagen’s weak bonds funnel ruptures such that the potentially harmful mechanoradicals are readily stabilized. Our simulations further suggest the length of helices between pairs of crosslinks to determine the trade-off between overall strength and breakage specificity. The combined results suggest this unique failure mode of collagen to be tailored towards combatting an early onset of macroscopic failure and material ageing

    FabSim3: An automation toolkit for verified simulations using high performance computing

    Get PDF
    A common feature of computational modelling and simulation research is the need to perform many tasks in complex sequences to achieve a usable result. This will typically involve tasks such as preparing input data, pre-processing, running simulations on a local or remote machine, post-processing, and performing coupling communications, validations and/or optimisations. Tasks like these can involve manual steps which are time and effort intensive, especially when it involves the management of large ensemble runs. Additionally, human errors become more likely and numerous as the research work becomes more complex, increasing the risk of damaging the credibility of simulation results. Automation tools can help ensure the credibility of simulation results by reducing the manual time and effort required to perform these research tasks, by making more rigorous procedures tractable, and by reducing the probability of human error due to a reduced number of manual actions. In addition, efficiency gained through automation can help researchers to perform more research within the budget and effort constraints imposed by their projects. This paper presents the main software release of FabSim3, and explains how our automation toolkit can improve and simplify a range of tasks for researchers and application developers. FabSim3 helps to prepare, submit, execute, retrieve, and analyze simulation workflows. By providing a suitable level of abstraction, FabSim3 reduces the complexity of setting up and managing a large-scale simulation scenario, while still providing transparent access to the underlying layers for effective debugging. The tool also facilitates job submission and management (including staging and curation of files and environments) for a range of different supercomputing environments. Although FabSim3 itself is application-agnostic, it supports a provably extensible plugin system where users automate simulation and analysis workflows for their own application domains. To highlight this, we briefly describe a selection of these plugins and we demonstrate the efficiency of the toolkit in handling large ensemble workflows
    corecore