855 research outputs found

    GPU acceleration of the Variational Monte Carlo Method for Many Body Physics

    Get PDF
    High-Performance computing is one of the major areas making inroads into the future for large-scale simulation. Applications such as 3D nuclear test, Molecular Dynamics, and Quantum Monte Carlo simulations are now developed on supercomputers using the latest computing technologies. As per the TOP500 supercomputers rating, most of today‘s supercomputers are now heterogeneous: with massively parallel Graphics Processing Units (GPU) equipped with Multi-core CPU(s) to increase the computational capacity. The Variational Monte Carlo(VMC) method is used in the Many Body Physics to study the ground state properties of a system. The wavefunction depends on some variational parameters, which contain the physics for a better prediction. In general, the variational parameters are chosen to realize some sort of order or broken symmetry such as superconductivity and magnetism. The variational approach is computationally expensive and requires a large number of Markov chains (MCs) to obtain convergence. The MCs exhibit abundant data parallelism and parallelizing across CPU clusters will prove to be expensive and does not scale in proportion to the system size. Hence, this method will be a suitable candidate on a massively parallel Graphics Processing Unit (GPU). In this research, we discuss about the various optimization and parallelization strategies adopted to port the VMC method to a NVIDIA GPU using CUDA. We obtained a speedup of nearly 3.85 X compared to the MPI implementation [4] and a speedup of upto 19 X compared to an object-oriented C++ code

    Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

    Full text link
    In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

    CompF2: Theoretical Calculations and Simulation Topical Group Report

    Full text link
    This report summarizes the work of the Computational Frontier topical group on theoretical calculations and simulation for Snowmass 2021. We discuss the challenges, potential solutions, and needs facing six diverse but related topical areas that span the subject of theoretical calculations and simulation in high energy physics (HEP): cosmic calculations, particle accelerator modeling, detector simulation, event generators, perturbative calculations, and lattice QCD (quantum chromodynamics). The challenges arise from the next generations of HEP experiments, which will include more complex instruments, provide larger data volumes, and perform more precise measurements. Calculations and simulations will need to keep up with these increased requirements. The other aspect of the challenge is the evolution of computing landscape away from general-purpose computing on CPUs and toward special-purpose accelerators and coprocessors such as GPUs and FPGAs. These newer devices can provide substantial improvements for certain categories of algorithms, at the expense of more specialized programming and memory and data access patterns.Comment: Report of the Computational Frontier Topical Group on Theoretical Calculations and Simulation for Snowmass 202

    Strong scaling of general-purpose molecular dynamics simulations on GPUs

    Get PDF
    We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5x.Comment: 30 pages, 14 figure

    Visual Simulation of Flow

    Get PDF
    We have adopted a numerical method from computational fluid dynamics, the Lattice Boltzmann Method (LBM), for real-time simulation and visualization of flow and amorphous phenomena, such as clouds, smoke, fire, haze, dust, radioactive plumes, and air-borne biological or chemical agents. Unlike other approaches, LBM discretizes the micro-physics of local interactions and can handle very complex boundary conditions, such as deep urban canyons, curved walls, indoors, and dynamic boundaries of moving objects. Due to its discrete nature, LBM lends itself to multi-resolution approaches, and its computational pattern, which is similar to cellular automata, is easily parallelizable. We have accelerated LBM on commodity graphics processing units (GPUs), achieving real-time or even accelerated real-time on a single GPU or on a GPU cluster. We have implemented a 3D urban navigation system and applied it in New York City with real-time live sensor data. In addition to a pivotal application in simulation of airborne contaminants in urban environments, this approach will enable the development of other superior prediction simulation capabilities, computer graphics and games, and a novel technology for computational science and engineering

    High Lundquist Number Simulations of Parker\u27s Model of Coronal Heating: Scaling and Current Sheet Statistics Using Heterogeneous Computing Architectures

    Get PDF
    Parker\u27s model [Parker, Astrophys. J., 174, 499 (1972)] is one of the most discussed mechanisms for coronal heating and has generated much debate. We have recently obtained new scaling results for a 2D version of this problem suggesting that the heating rate becomes independent of resistivity in a statistical steady state [Ng and Bhattacharjee, Astrophys. J., 675, 899 (2008)]. Our numerical work has now been extended to 3D using high resolution MHD numerical simulations. Random photospheric footpoint motion is applied for a time much longer than the correlation time of the motion to obtain converged average coronal heating rates. Simulations are done for different values of the Lundquist number to determine scaling. In the high-Lundquist number limit (S \u3e 1000), the coronal heating rate obtained is consistent with a trend that is independent of the Lundquist number, as predicted by previous analysis and 2D simulations. We will present scaling analysis showing that when the dissipation time is comparable or larger than the correlation time of the random footpoint motion, the heating rate tends to become independent of Lundquist number, and that the magnetic energy production is also reduced significantly. We also present a comprehensive reprogramming of our simulation code to run on NVidia graphics processing units using the Compute Unified Device Architecture (CUDA) and report code performance on several large scale heterogenous machines

    Structures and dynamics investigation of phase selection in metallic alloy systems

    Get PDF
    Different phases of metallic alloys have a wide range of applications. However, the driving mechanisms of the phase selections can be complex. For example, the detailed pathways of the phase transitions in the devitrification process still lack a comprehensive interpretation. So, the understanding of the driving mechanisms of the phase selections is very important. In this thesis, we focus on the study of the Al-Sm and other related metallic alloy systems by simulation and experiment. A procedure to evaluate the free energy has been developed within the framework of thermodynamic integration, coupled with extensive GPU-accelerated molecular dynamics (MD) simulations; The ``spatially-correlated site occupancy\u27\u27 has been observed and measured in the ϵ\epsilon-Al60_{60}Sm11_{11} phase. Contrary to the common belief that nonstoichiometry is often the outcome of the interplay of enthalpy of formation and configurational entropy at finite temperatures, our results from Monte Carlo (MC) and molecular dynamics (MD) simulations, imply that kinetic effects, especially the limited diffusivity of Sm is crucial for the appearance of the observed spatial correlations in the nonstoichiometric ϵ\epsilon phase. Moreover, in order to overcome the time limitation in MD simulation of the nucleation process, a ``persistent-embryo method\u27\u27 has been developed, which opens a new avenue to study solidification under realistic experimental conditions via atomistic computer simulation. Based on this thesis study, we have achieved deeper understanding of the driving mechanisms of the phase selections, and laid a foundation for further prediction and control of the fabrication of novel metallic alloy materials. This thesis consists of the following seven chapters: Chapter 1 briefly introduces the history of the development of the metallic alloy materials, and their significant impact on human civilization. In particular, the research background of metallic glasses has been reviewed, and some unsolved questions have been raised. Chapter 2 is the literature review. In the first section of it, simulation methods, including molecular dynamics (MD) simulation, Monte Carlo method (MC) simulation, and related technical issues in the computer simulation to mimic the real system have been reviewed. In the second section of it, analysis methods, including structural and dynamical analysis methods, classical nucleation theory, free energy computing algorithms, and experimental techniques have been reviewed. Chapter 3 reports our work in a self-contained procedure to evaluate the free energy of liquid and solid phases of an alloy system. We start from the Einstein crystal as the reference system, using thermodynamic integration to solve the free energy of a single-element solid phase. Then we compute the free energy difference between the solid and liquid phases using Gibbs-Duhem integration. After that we construct an ``alchemical\u27\u27 path connecting a pure liquid and a liquid alloy to calculate the mixing enthalpy and entropy. This procedure is of great importance because the evaluation of free energy is fundamental to achieving microscopic understandings of freezing and melting phenomena. Chapter 4 elucidates the origin of the spatially-correlated site occupancy in the non-stoichiometric meta-stable ϵ\epsilon-Al60_{60}Sm11_{11} phase. This STEM observed spatially-correlated site occupancy cannot be explained by the ``average crystal\u27\u27 description from Rietveld analysis of diffraction data, or by the lowest free energy structure established in MC simulations. MD simulations of the growth of ϵ\epsilon-Al60_{60}Sm11_{11} in undercooled liquid show that when the diffusion range of Sm is limited to ∼4A˚\sim 4\AA, the correlation function of the as-grown crystal structure agrees well with that of the STEM images. Conclusion thus has been made that the kinetic effects, especially the limited diffusivity of Sm atoms plays an important role in determining the non-stoichiometric site occupancy. In addition to the free energy point of view, this result helps us to have a deeper understanding of phase selections from structural and dynamical points of views. Chapter 5 describes the ``persistent-embryo\u27\u27 method (PEM) nucleation simulation. The PEM is developed to facilitate crystal nucleation in MD simulations by preventing small crystal embryos from melting using external spring forces, so that the early state of rare nucleation events can be accessed. This method opens a new avenue to study solidification under realistic experimental conditions. The nucleation rates of pure Ni, and of B2 phase of glass former Cu-Zr alloy have been computed using PEM. We also apply PEM to the Al-Sm system to study the nucleation of ϵ\epsilon-Al60_{60}Sm11_{11} phase in the undercooled Al-Sm liquid, complex and interesting behaviors, different from the Ni case, have been found. Chapter 6 presents an implementation of EAM and FS inter-atomic potentials in HOOMD-blue, a GPU software designed to perform classical molecular dynamic simulations. The accuracy of the code has been verified in a variety of broad tests, the performance of the code is significantly faster than LAMMPS running on a typical CPU cluster. Furthermore, our hoomd.metal module follows HOOMD-blue code convention, which allows it to be coupled to the extensive python libraries. This package makes the MD simulations in the thesis and other related fields faster and more convenient. Chapter 7 is a summary of my Ph.D. thesis study, and proposes a plan for future works

    Particle-resolved thermal lattice Boltzmann simulation using OpenACC on multi-GPUs

    Full text link
    We utilize the Open Accelerator (OpenACC) approach for graphics processing unit (GPU) accelerated particle-resolved thermal lattice Boltzmann (LB) simulation. We adopt the momentum-exchange method to calculate fluid-particle interactions to preserve the simplicity of the LB method. To address load imbalance issues, we extend the indirect addressing method to collect fluid-particle link information at each timestep and store indices of fluid-particle link in a fixed index array. We simulate the sedimentation of 4,800 hot particles in cold fluids with a domain size of 400024000^{2}, and the simulation achieves 1750 million lattice updates per second (MLUPS) on a single GPU. Furthermore, we implement a hybrid OpenACC and message passing interface (MPI) approach for multi-GPU accelerated simulation. This approach incorporates four optimization strategies, including building domain lists, utilizing request-answer communication, overlapping communications with computations, and executing computation tasks concurrently. By reducing data communication between GPUs, hiding communication latency through overlapping computation, and increasing the utilization of GPU resources, we achieve improved performance, reaching 10846 MLUPS using 8 GPUs. Our results demonstrate that the OpenACC-based GPU acceleration is promising for particle-resolved thermal lattice Boltzmann simulation.Comment: 45 pages, 18 figure
    • …
    corecore