960 research outputs found

    Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

    Full text link
    Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing 2010 (submitted April 12, 2010

    PetaQCD : En Route for the automatic code generation for lattice QCD

    Get PDF
    International audienceNew computer architectures with various weak and strong characteristics appear with increasing speed. We present our work in progress for the tool-chain aimed at rapid prototyping of the novel dirac matrix inversion algorithms for emerging architectures. From scientific description of the algorithm on the front end to the several back ends we discuss how symbolic manipulation may be used to create and optimize lattice calculations on the fly

    Steering in computational science: mesoscale modelling and simulation

    Full text link
    This paper outlines the benefits of computational steering for high performance computing applications. Lattice-Boltzmann mesoscale fluid simulations of binary and ternary amphiphilic fluids in two and three dimensions are used to illustrate the substantial improvements which computational steering offers in terms of resource efficiency and time to discover new physics. We discuss details of our current steering implementations and describe their future outlook with the advent of computational grids.Comment: 40 pages, 11 figures. Accepted for publication in Contemporary Physic

    Simulating Heisenberg Interactions in the Ising Model with Strong Drive Fields

    Full text link
    The time-evolution of an Ising model with large driving fields over discrete time intervals is shown to be reproduced by an effective XXZ-Heisenberg model at leading order in the inverse field strength. For specific orientations of the drive field, the dynamics of the XXX-Heisenberg model is reproduced. These approximate equivalences, valid above a critical driving field strength set by dynamical phase transitions in the Ising model, are expected to enable quantum devices that natively evolve qubits according to the Ising model to simulate more complex systems.Comment: 10 pages, 5 figures, accepted versio

    Solving Lattice QCD systems of equations using mixed precision solvers on GPUs

    Full text link
    Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodyamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.Comment: 30 pages, 7 figure

    The BAGEL assembler generation library

    Get PDF

    Preparation for Quantum Simulation of the 1+1D O(3) Non-linear {\sigma}-Model using Cold Atoms

    Full text link
    The 1+1D O(3) non-linear {\sigma}-model is a model system for future quantum lattice simulations of other asymptotically-free theories, such as non-Abelian gauge theories. We find that utilizing dimensional reduction can make efficient use of two-dimensional layouts presently available on cold atom quantum simulators. A new definition of the renormalized coupling is introduced, which is applicable to systems with open boundary conditions and can be measured using analog quantum simulators. Monte Carlo and tensor network calculations are performed to determine the quantum resources required to reproduce perturbative short-distance observables. In particular, we show that a rectangular array of 48 Rydberg atoms with existing quantum hardware capabilities should be able to adiabatically prepare low-energy states of the perturbatively-matched theory. These states can then be used to simulate non-perturbative observables in the continuum limit that lie beyond the reach of classical computers.Comment: 12 pages, 5 figures, 2 tables, published versio

    An FPGA-based Torus Communication Network

    Full text link
    We describe the design and FPGA implementation of a 3D torus network (TNW) to provide nearest-neighbor communications between commodity multi-core processors. The aim of this project is to build up tightly interconnected and scalable parallel systems for scientific computing. The design includes the VHDL code to implement on latest FPGA devices a network processor, which can be accessed by the CPU through a PCIe interface and which controls the external PHYs of the physical links. Moreover, a Linux driver and a library implementing custom communication APIs are provided. The TNW has been successfully integrated in two recent parallel machine projects, QPACE and AuroraScience. We describe some details of the porting of the TNW for the AuroraScience system and report performance results.Comment: 7 pages, 3 figures, proceedings of the XXVIII International Symposium on Lattice Field Theory, Lattice2010, June 14-19, 2010, Villasimius, Sardinia, Ital
    corecore