960 research outputs found
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Graphics Processing Units (GPUs) are having a transformational effect on
numerical lattice quantum chromodynamics (LQCD) calculations of importance in
nuclear and particle physics. The QUDA library provides a package of mixed
precision sparse matrix linear solvers for LQCD applications, supporting single
GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This
library, interfaced to the QDP++/Chroma framework for LQCD calculations, is
currently in production use on the "9g" cluster at the Jefferson Laboratory,
enabling unprecedented price/performance for a range of problems in LQCD.
Nevertheless, memory constraints on current GPU devices limit the problem sizes
that can be tackled. In this contribution we describe the parallelization of
the QUDA library onto multiple GPUs using MPI, including strategies for the
overlapping of communication and computation. We report on both weak and strong
scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in
excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing
2010 (submitted April 12, 2010
PetaQCD : En Route for the automatic code generation for lattice QCD
International audienceNew computer architectures with various weak and strong characteristics appear with increasing speed. We present our work in progress for the tool-chain aimed at rapid prototyping of the novel dirac matrix inversion algorithms for emerging architectures. From scientific description of the algorithm on the front end to the several back ends we discuss how symbolic manipulation may be used to create and optimize lattice calculations on the fly
Steering in computational science: mesoscale modelling and simulation
This paper outlines the benefits of computational steering for high
performance computing applications. Lattice-Boltzmann mesoscale fluid
simulations of binary and ternary amphiphilic fluids in two and three
dimensions are used to illustrate the substantial improvements which
computational steering offers in terms of resource efficiency and time to
discover new physics. We discuss details of our current steering
implementations and describe their future outlook with the advent of
computational grids.Comment: 40 pages, 11 figures. Accepted for publication in Contemporary
Physic
Simulating Heisenberg Interactions in the Ising Model with Strong Drive Fields
The time-evolution of an Ising model with large driving fields over discrete
time intervals is shown to be reproduced by an effective XXZ-Heisenberg model
at leading order in the inverse field strength. For specific orientations of
the drive field, the dynamics of the XXX-Heisenberg model is reproduced. These
approximate equivalences, valid above a critical driving field strength set by
dynamical phase transitions in the Ising model, are expected to enable quantum
devices that natively evolve qubits according to the Ising model to simulate
more complex systems.Comment: 10 pages, 5 figures, accepted versio
Solving Lattice QCD systems of equations using mixed precision solvers on GPUs
Modern graphics hardware is designed for highly parallel numerical tasks and
promises significant cost and performance benefits for many scientific
applications. One such application is lattice quantum chromodyamics (lattice
QCD), where the main computational challenge is to efficiently solve the
discretized Dirac equation in the presence of an SU(3) gauge field. Using
NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector
product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops for double,
single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have
developed a new mixed precision approach for Krylov solvers using reliable
updates which allows for full double precision accuracy while using only single
or half precision arithmetic for the bulk of the computation. The resulting
BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations
until convergence, perform better than the usual defect-correction approach for
mixed precision.Comment: 30 pages, 7 figure
Preparation for Quantum Simulation of the 1+1D O(3) Non-linear {\sigma}-Model using Cold Atoms
The 1+1D O(3) non-linear {\sigma}-model is a model system for future quantum
lattice simulations of other asymptotically-free theories, such as non-Abelian
gauge theories. We find that utilizing dimensional reduction can make efficient
use of two-dimensional layouts presently available on cold atom quantum
simulators. A new definition of the renormalized coupling is introduced, which
is applicable to systems with open boundary conditions and can be measured
using analog quantum simulators. Monte Carlo and tensor network calculations
are performed to determine the quantum resources required to reproduce
perturbative short-distance observables. In particular, we show that a
rectangular array of 48 Rydberg atoms with existing quantum hardware
capabilities should be able to adiabatically prepare low-energy states of the
perturbatively-matched theory. These states can then be used to simulate
non-perturbative observables in the continuum limit that lie beyond the reach
of classical computers.Comment: 12 pages, 5 figures, 2 tables, published versio
An FPGA-based Torus Communication Network
We describe the design and FPGA implementation of a 3D torus network (TNW) to
provide nearest-neighbor communications between commodity multi-core
processors. The aim of this project is to build up tightly interconnected and
scalable parallel systems for scientific computing. The design includes the
VHDL code to implement on latest FPGA devices a network processor, which can be
accessed by the CPU through a PCIe interface and which controls the external
PHYs of the physical links. Moreover, a Linux driver and a library implementing
custom communication APIs are provided. The TNW has been successfully
integrated in two recent parallel machine projects, QPACE and AuroraScience. We
describe some details of the porting of the TNW for the AuroraScience system
and report performance results.Comment: 7 pages, 3 figures, proceedings of the XXVIII International Symposium
on Lattice Field Theory, Lattice2010, June 14-19, 2010, Villasimius,
Sardinia, Ital
- …