9 research outputs found

    SU(2) Lattice Gauge Theory Simulations on Fermi GPUs

    Full text link
    In this work we explore the performance of CUDA in quenched lattice SU(2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU(2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50 00050\ 000) without smearing and almost 2 0002\ 000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200×200 \times the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2×2 \times slower) than single precision computations.Comment: 20 pages, 11 figures, 3 tables, accepted in Journal of Computational Physic

    Colour flux-tubes in static Pentaquark and Tetraquark systems

    Full text link
    The colour fields created by the static tetraquark and pentaquark systems are computed in quenched SU(3) lattice QCD, with gauge invariant lattice operators, in a 24^3 x 48 lattice at beta=6.2. We generate our quenched configurations with GPUs, and detail the respective benchmanrks in different SU(N) groups. While at smaller distances the coulomb potential is expected to dominate, at larger distances it is expected that fundamental flux tubes, similar to the flux-tube between a quark and an antiquark, emerge and confine the quarks. In order to minimize the potential the fundamental flux tubes should connect at 120o angles. We compute the square of the colour fields utilizing plaquettes, and locate the static sources with generalized Wilson loops and with APE smearing. The tetraquark system is well described by a double-Y-shaped flux-tube, with two Steiner points, but when quark-antiquark pairs are close enough the two junctions collapse and we have an X-shaped flux-tube, with one Steiner point. The pentaquark system is well described by a three-Y-shaped flux-tube where the three flux the junctions are Steiner points.Comment: 7 pages, 6 figures, contribution to the International School of Nuclear Physics, 33rd Course From Quarks and Gluons to Hadrons and Nuclei, Erice-Sicily: 16 - 24 September 201

    QCD simulations with staggered fermions on GPUs

    Full text link
    We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it. After pointing out the main bottlenecks and how to circumvent them, we discuss the obtained performances. We present some preliminary results regarding OpenCL and multiGPU extensions of our code and discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer Physics Communication

    Computational Physics on Graphics Processing Units

    Full text link
    The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

    Advanced Optimization Techniques For Monte Carlo Simulation On Graphics Processing Units

    Get PDF
    The objective of this work is to design and implement a self-adaptive parallel GPU optimized Monte Carlo algorithm for the simulation of adsorption in porous materials. We focus on Nvidia\u27s GPUs and CUDA\u27s Fermi architecture specifically. The resulting package supports the different ensemble methods for the Monte Carlo simulation, which will allow for the simulation of multi-component adsorption in porous solids. Such an algorithm will have broad applications to the development of novel porous materials for the sequestration of CO2 and the filtration of toxic industrial chemicals. The primary objective of this work is the release of a massively parallel open source Monte Carlo simulation engine implemented using GPUs, called GOMC. The code will utilize the canonical ensemble, and the Gibbs ensemble method, which will allow for the simulation of multiple phenomena, including liquid-vapor phase coexistence, and single and multi-component adsorption in porous materials. In addition, the grand canonical ensemble and the configurational-bias algorithms have been implemented so that polymeric materials and small proteins may be simulated. This simulation engine is the only open source GPU optimized Monte Carlo code available for the generalized simulation of adsorption and phase equilibria on a very large scale. As a result of conducting many optimization techniques and allowing the system to adjust for the change of simulation state, the original MC algorithm has been rewritten based on an existing serial algorithm to suit the massive parallel devices resulting in reductions in computational time. This large time reduction allow for the simulation of significantly larger systems for longer timescales than is currently possible with existing implementations. Results of the extensive research and applying device specific optimizations resulted in significant speedup. First, for the NVT method, a fully optimized serial algorithm has been implemented and the performance results has been compared to Towhee. A speedup of about 438 times has been achieved for a relatively small size problem of 4096 particles. In addition, two algorithms to run on the GPU with and without cell list structure have been implemented. The total speedup of the parallel code with cell list over the serial code was more than 160x faster. Moreover, for the grand canonical ensemble, a serial and two parallel algorithms have been developed. The simulation box in this method can be resized, which added a change to the algorithm that needed to adapt with the box size and adjust itself. The performance of running the CUDA code with cell list versus the serial code that doesn\u27t have a cell list structure is a factor of 130 times faster. More MC ensembles have been transferred to the GPU. The Gibbs ensemble method has two simulation boxes and three types of moves. This method has been studied carefully and the GPU algorithm has been implemented to port the computation intensive functions to the GPU. The performance of the GPU code was about 50x faster than the serial code. Finally, an extension of the Gibbs method has been implemented on the GPU. The particle transfer from one box to the other is the affected move type by this extension. CUDA streams are used to parallelize K trials for this method. A factor of three times speedup for the particle transfer move has been achieved for the best case. However, due to the low execution rate of the particle transfer move, just 10% of the total moves, the speedup has minimal effect on overall execution time of the simulation. Furthermore, a different run with all move types on Kepler K20c card has been executed, and a factor of 2 times speedup has been reported over the CUDA code on the GeForce GTX 480 card. The main contribution of this work to society is when the above implementations become open source to the public through http://gomc.eng.wayne.edu. Also, other researchers can take advantage of the lessons learned with advanced optimizations and self-adapting mechanisms specific to the GPU. On the application level, the current code can be used by the chemical engineering community to explore accurate and affordable simulations that were not possible before
    corecore