359 research outputs found

    TweTriS: Twenty trillion-atom simulation

    Get PDF
    Significant improvements are presented for the molecular dynamics code ls1 mardyn — a linked cell-based code for simulating a large number of small, rigid molecules with application areas in chemical engineering. The changes consist of a redesign of the SIMD vectorization via wrappers, MPI improvements and a software redesign to allow memory-efficient execution with the production trunk to increase portability and extensibility. Two novel, memory-efficient OpenMP schemes for the linked cell-based force calculation are presented, which are able to retain Newton’s third law optimization. Comparisons to well-optimized Verlet list-based codes, such as LAMMPS and GROMACS, demonstrate the viability of the linked cell-based approach. The present version of ls1 mardyn is used to run simulations on entire supercomputers, maximizing the number of sampled atoms. Compared to the preceding version of ls1 mardyn on the entire set of 9216 nodes of SuperMUC, Phase 1, 27% more atoms are simulated. Weak scaling performance is increased by up to 40% and strong scaling performance by up to more than 220%. On Hazel Hen, strong scaling efficiency of up to 81% and 189 billion molecule updates per second is attained, when scaling from 8 to 7168 nodes. Moreover, a total of 20 trillion atoms is simulated at up to 88% weak scaling efficiency running at up to 1.33 PFLOPS. This represents a fivefold increase in terms of the number of atoms simulated to date.BMBF, 01IH16008, Verbundprojekt: TaLPas - Task-basierte Lastverteilung und Auto-Tuning in der Partikelsimulatio

    Algorithmische und Code-Optimierungen Molekulardynamiksimulationen für Verfahrenstechnik

    Get PDF
    The focus of this work lies on implementational improvements and, in particular, node-level performance optimization of the simulation software ls1-mardyn. Through data structure improvements, SIMD vectorization and, especially, OpenMP parallelization, the world’s first simulation of 2*1013 molecules at over 1 PFLOP/sec was enabled. To allow for long-range interactions, the Fast Multipole Method was introduced to ls1-mardyn. The algorithm was optimized for sequential, shared-memory, and distributed-memory execution on up to 32,768 MPI processes.Der Fokus dieser Arbeit liegt auf Code-Optimierungen und insbesondere Leistungsoptimierung auf Knoten-Ebene für die Simulationssoftware ls1-mardyn. Durch verbesserte Datenstrukturen, SIMD-Vektorisierung und vor allem OpenMP-Parallelisierung wurde die weltweit erste Petaflop-Simulation von 2*1013 Molekülen ermöglicht. Zur Simulation von langreichweitigen Wechselwirkungen wurde die Fast-Multipole-Methode in ls1-mardyn eingeführt. Sequenzielle, Shared- und Distributed-Memory-Optimierungen wurden angewandt und erlaubten eine Ausführung auf bis zu 32768 MPI-Prozessen

    Modelling solid/fluid interactions in hydrodynamic flows: a hybrid multiscale approach

    Get PDF
    With the advent of high performance computing (HPC), we can simulate nature at time and length scales that we could only dream of a few decades ago. Through the development of theory and numerical methods in the last fifty years, we have at our disposal a plethora of mathematical and computational tools to make powerful predictions about the world which surrounds us. From quantum methods like Density Functional Theory (DFT); going through atomistic methods such as Molecular Dynamics (MD) and Monte Carlo (MC), right up to more traditional macroscopic techniques based on Partial Differential Equations (PDEs) discretization like the Finite Element Method (FEM) or Finite Volume Method (FVM), which are respectively, the foundation of computational Structural Analysis and Computational Fluid Dynamics (CFD). Many modern scientific computing challenges in physics stem from combining appropriately two or more of these methods, in order to tackle problems that could not be solved otherwise using just one of them alone. This is known as multi-scale modeling, which aims to achieve a trade-off between computational cost and accuracy by combining two or more physical models at different scales. In this work, a multi-scale domain decomposition technique based on coupling MD and CFD methods, has been developed to make affordable the study of slip and friction, with atomistic detail, at length scales otherwise impossible by fully atomistic methods alone. A software framework has been developed to facilitate the execution of this particular kind of simulations on HPC clusters. This have been possible by employing the in-house developed CPL_LIBRARY software library, which provides key functionality to implement coupling through domain decomposition.Open Acces

    ICASE

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in the areas of (1) applied and numerical mathematics, including numerical analysis and algorithm development; (2) theoretical and computational research in fluid mechanics in selected areas of interest, including acoustics and combustion; (3) experimental research in transition and turbulence and aerodynamics involving Langley facilities and scientists; and (4) computer science

    Implementation of a coupled computational chain to the combustion chamber's heat transfer

    Get PDF
    The design of aeronautical engines is subject to many constraints that cover performance gain as well as increasingly sensitive environmental issues. These often contradicting objectives are currently being answered through an increase in the local and global temperature in the hot stages of the engine. As a result, the solid parts encounter very high temperature levels and gradients that are critical for the engine lifespan. Combustion chamber walls in particular are subject to large thermal constraints. It is thus essential for designers to characterize accurately the local thermal state of such devices. Today, wall temperature evaluation is obtained experimentally by complex thermocolor tests. To limit such expensive experiments, efforts are currently performed to provide high fidelity numerical tools able to predict the combustion chamber wall temperature. This specific thermal field however requires the consideration of all the modes of heat transfer (convection, conduction and radiation) and the heat production (through the chemical reaction) within the burner. The resolution of such a multi-physic problem can be done numerically through the use of several dedicated numerical and algorithmic approaches. In this manuscript, the methodology relies on a partitioned coupling approach, based on a Large Eddy Simulation (LES) solver to resolve the flow motion and the chemical reactions, a Discrete Ordinate Method (DOM) radiation solver and an unsteady solid conduction code. The various issues related to computer resources distribution as well as the coupling methodology employed to deal with disparity of time and space scales present in each mode of heat transfer are addressed in this manuscript. Coupled application high performance studies, carried out both on a toy model and an industrial burner configuration evidence parameters of importance as well as potential path of improvements. The thermal coupling approach is then considered from a physical point of view on two distinct configurations. First, one addresses the impact of the methodology and the thermal equilibrium state between a reacting fluid and a solid for a simple flame holder academic case. The effect of the flame holder wall temperature on the flame stabilization pattern is addressed through fluid-only predictions. These simulations highlight interestingly three different theoretical equilibrium states. The physical relevance of these three states is then assessed through the computation of several CHT simulations for different initial solutions and solid conductivities. It is shown that only two equilibrium states are physical and that bifurcation between the two possible physical states depends both on solid conductivity and initial condition.Furthermore, the coupling methodology is shown to have no impact on the solutions within the range of parameters tested. A similar methodology is then applied to a helicopter combustor for which radiative heat transfer is additionally considered. Different computations are presented to assess the role of each heat transfer process on the temperature field: a reference adiabatic fluid-only simulation, Conjugate Heat Transfer, RadiationFluid Thermal Interaction and fully coupled simulations are performed. It is shown that coupling LES with conduction in walls is feasible in an industrial context with acceptable CPU costs and gives good trends of temperature repartition. Then, for the combustor geometry and operating point studied, computations illustrate that radiation plays an important role in the wall temperature distribution. Comparisons with thermocolor tests are globally in a better agreement when the three solvers are coupled

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    FPGA-based range-limited molecular dynamics acceleration

    Full text link
    Molecular Dynamics (MD) is a computer simulation technique that executes iteratively over discrete, infinitesimal time intervals. It has been a widely utilized application in the fields of material sciences and computer-aided drug design for many years, serving as a crucial benchmark in high-performance computing (HPC). Numerous MD packages have been developed and effectively accelerated using GPUs. However, as the limits of Moore's Law are reached, the performance of an individual computing node has reached its bottleneck, while the performance of multiple nodes is primarily hindered by scalability issues, particularly when dealing with small datasets. In this thesis, the acceleration with respect to small datasets is the main focus. With the recent COVID-19 pandemic, drug discovery has gained significant attention, and Molecular Dynamics (MD) has emerged as a crucial tool in this process. Particularly, in the critical domain of drug discovery, small simulations involving approximately ~50K particles are frequently employed. However, it is important to note that small simulations do not necessarily translate to faster results, as long-term simulations comprising billions of MD iterations and more are essential in this context. In addition to dataset size, the problem of interest is further constrained. Referred to as the most computationally demanding aspect of MD, the evaluation of range-limited (RL) forces not only accounts for 90% of the MD computation workload but also involves irregular mapping patterns of 3-D data onto 2-D processor networks. To emphasize, this thesis centers around the acceleration of RL MD specifically for small datasets. In order to address the single-node bottleneck and multi-node scaling challenges, the thesis is organized into two progressive stages of investigation. The first stage delves extensively into enhancing single-node efficiency by examining various factors such as workload mapping from 3-D to 2-D, data routing, and data locality. The second stage focuses on studying multi-node scalability, with a particular emphasis on strong scaling, bandwidth demands, and the synchronization mechanisms between nodes. Through our study, the results show our design on a Xilinx U280 FPGA achieves 51.72x and 4.17x speedups with respect to an Intel Xeon Gold 6226R CPU, and a Quadro RTX 8000 GPU. Our research towards strong scaling also demonstrates that 8 Xilinx U280 FPGAs connected to a switch achieves 4.67x speedup compared to an Nvidia V100 GP

    A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories

    Full text link
    Abstract—As parallel algorithms and architectures drive the longest molecular dynamics (MD) simulations towards the millisecond scale, traditional sequential post-simulation data analysis methods are becoming increasingly untenable. Inspired by the programming interface of Google’s MapReduce, we have built a new parallel analysis framework called HiMach, which allows users to write trajectory analysis programs sequentially, and carries out the parallel execution of the programs automatically. We introduce (1) a new MD trajectory data analysis model that is amenable to parallel processing, (2) a new interface for defining trajectories to be analyzed, (3) a novel method to make use of an existing sequential analysis tool called VMD, and (4) an extension to the original MapReduce model to support multiple rounds of analysis. Performance evaluations on up to 512 cores demonstrate the efficiency and scalability of the HiMach framework on a Linux cluster. I

    Supercomputing Frontiers

    Get PDF
    This open access book constitutes the refereed proceedings of the 7th Asian Conference Supercomputing Conference, SCFA 2022, which took place in Singapore in March 2022. The 8 full papers presented in this book were carefully reviewed and selected from 21 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling
    • …
    corecore