143 research outputs found

    Lattice QCD based on OpenCL

    Get PDF
    We present an OpenCL-based Lattice QCD application using a heatbath algorithm for the pure gauge case and Wilson fermions in the twisted mass formulation. The implementation is platform independent and can be used on AMD or NVIDIA GPUs, as well as on classical CPUs. On the AMD Radeon HD 5870 our double precision dslash implementation performs at 60 GFLOPS over a wide range of lattice sizes. The hybrid Monte-Carlo presented reaches a speedup of four over the reference code running on a server CPU.Comment: 19 pages, 11 figure

    QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

    Full text link
    The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluodynamics in external magnetic field at finite temperature and O(N) model is developed. The code is implemented in OpenCL, tested on AMD and NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices. The package contains minimal external library dependencies and is OS platform-independent. It is optimized for heterogeneous computing due to the possibility of dividing the lattice into non-equivalent parts to hide the difference in performances of the devices used. QCDGPU has client-server part for distributed simulations. The package is designed to produce lattice gauge configurations as well as to analyze previously generated ones. QCDGPU may be executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL library for pseudo-random numbers generation on OpenCL-compatible devices, which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure

    QCD simulations with staggered fermions on GPUs

    Full text link
    We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it. After pointing out the main bottlenecks and how to circumvent them, we discuss the obtained performances. We present some preliminary results regarding OpenCL and multiGPU extensions of our code and discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer Physics Communication

    Design and optimization of a portable LQCD Monte Carlo code using OpenACC

    Full text link
    The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

    Computational Physics on Graphics Processing Units

    Full text link
    The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

    First Application of Lattice QCD to Pezy-SC Processor

    Get PDF
    AbstractPezy-SC processor is a novel new architecture developed by Pezy Computing K. K. that has achieved large computational power with low electric power consumption. It works as an accelerator device similarly to GPGPUs. A programming environment that resembles OpenCL is provided. Using a hybrid parallel system “Suiren” installed at KEK, we port and tune a simulation code of lattice QCD, which is computational elementary particle physics based on Monte Carlo method. We offload an iterative solver of a linear equation for a fermion matrix, which is in general the most time consuming part of the lattice QCD simulations. On single and multiple Pezy-SC devices, the sustained performance is measured for the matrix multiplications and a BiCGStab solver. We examine how the data layout affects the performance. The results demonstrate that the Pezy-SC processors provide a feasible environment to perform numerical lattice QCD simulations
    corecore