Search CORE

73 research outputs found

Towards Lattice Quantum Chromodynamics on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue: 'Elsevier BV'
Publication date: 04/12/2019
Field of study

In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

University of Regensburg Publication Server

Jagiellonian Univeristy Repository

Implementation of the conjugate gradient algorithm on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue
Publication date: 08/11/2018
Field of study

Results of porting parts of the Lattice Quantum Chromodynamics code to modern FPGA devices are presented. A single-node, double precision implementation of the Conjugate Gradient algorithm is used to invert numerically the Dirac-Wilson operator on a 4-dimensional grid on a Xilinx Zynq evaluation board. The code is divided into two software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in programmable logic, and the rest of the algorithm runs on the ARM cores. Optimized data blocks are used to efficiently use data movement infrastructure allowing to reach intervals of 1 clock cycle. We show that the FPGA implementation can offer a comparable performance compared to that obtained using Intel Xeon Phi KNL.Comment: Proceedings of the 36th Annual International Symposium on Lattice Field Theory - LATTICE201

arXiv.org e-Print Archive

Crossref

Jagiellonian Univeristy Repository

Investigating the Dirac operator evaluation with FPGAs

Author: Korcyl G.
Korcyl P.
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2019
Field of study

In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such processors in a high level language like C++, makes modern FPGA devices a serious candidate as building blocks of a general purpose High Performance Computing solution. In this contribution we describe benchmarks which we performed using a Lattice QCD code, a highly compute-demanding HPC academic code for elementary particle simulations. We benchmark the performance of a single FPGA device running in two modes: using the external or embedded memory. We discuss both approaches in detail using the Xilinx U250 device and provide estimates for the necessary memory throughput and the minimal amount of resources needed to deliver optimal performance depending on the available hardware platform.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Implementation of the conjugate gradient algorithm in Lattice QCD on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue: 'Sissa Medialab'
Publication date: 01/01/2019
Field of study

Crossref

Jagiellonian Univeristy Repository

Solving Lattice QCD systems of equations using mixed precision solvers on GPUs

Author: Barros
Brannick
Bulava
Bunk
C. Rebbi
Clark
De Forcrand
DeGrand
Edwards
Egri
Holmgren
K. Barros
Kahan
M.A. Clark
Martin
NVIDIA Corporation
R. Babich
R.C. Brower
Sleijpen
Publication venue: 'Elsevier BV'
Publication date: 21/12/2009
Field of study

Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodyamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.Comment: 30 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Recent development and perspectives of machines for lattice QCD

Author: Aglietti
Ammendola
Aoki
Aoki
APE
Arndt
Bartoloni
Bhanot
Bodin
Bodin
Boyle
Boyle
Boyle
Brickner
Chen
Chiu
Christ
Christ
Christ
CP-PACS
Csikor
Fischer
Fodor
Fodor
Gellrich
Gottlieb
Gottlieb
Hasenbusch
Holmgren
Iwasaki
Iwasaki
Lindahl
Luo
Luscher
Marinari
Marinari
Mawhinney
Meuer
Negrassus
Ridge
Sexton
Singh
Sroczynski
Sroczynski
Th Lippert
Watson
Watson
Weingarten
Weingarten
Publication venue: 'Elsevier BV'
Publication date: 10/11/2003
Field of study

I highlight recent progress in cluster computer technology and assess status and prospects of cluster computers for lattice QCD with respect to the development of QCDOC and apeNEXT. Taking the LatFor test case, I specify a 512-processor QCD-cluster better than 1$/Mflops.Comment: 14 pages, 17 figures, Lattice2003(plenary

arXiv.org e-Print Archive

Crossref

Juelich Shared Electronic Resources

CERN Document Server

ParaFPGA 2017 : enlarging the scope of parallel programming with FPGAs

Author: D'Hollander Erik
Touhafi Abdellah
Publication venue: 'IOS Press'
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

The QPACE Supercomputer : Applications of Random Matrix Theory in Two-Colour Quantum Chromodynamics

Author: Meyer Nils
Publication venue
Publication date: 15/06/2016
Field of study

QPACE is a massively parallel and scalable supercomputer designed to meet the requirements of applications in Lattice Quantum Chromodynamics. The project was carried out by several academic institutions in collaboration with IBM Germany and other industrial partners. In November 2009 and June 2010 QPACE was the leading architecture on the Green 500 list of the most energy efficient supercomputers in the world

University of Regensburg Publication Server

CompF2: Theoretical Calculations and Simulation Topical Group Report

Author: Boyle Peter
Pedro Kevin
Qiang Ji
Publication venue
Publication date: 16/09/2022
Field of study

This report summarizes the work of the Computational Frontier topical group on theoretical calculations and simulation for Snowmass 2021. We discuss the challenges, potential solutions, and needs facing six diverse but related topical areas that span the subject of theoretical calculations and simulation in high energy physics (HEP): cosmic calculations, particle accelerator modeling, detector simulation, event generators, perturbative calculations, and lattice QCD (quantum chromodynamics). The challenges arise from the next generations of HEP experiments, which will include more complex instruments, provide larger data volumes, and perform more precise measurements. Calculations and simulations will need to keep up with these increased requirements. The other aspect of the challenge is the evolution of computing landscape away from general-purpose computing on CPUs and toward special-purpose accelerators and coprocessors such as GPUs and FPGAs. These newer devices can provide substantial improvements for certain categories of algorithms, at the expense of more specialized programming and memory and data access patterns.Comment: Report of the Computational Frontier Topical Group on Theoretical Calculations and Simulation for Snowmass 202

arXiv.org e-Print Archive