5 research outputs found
Towards Lattice Quantum Chromodynamics on FPGA devices
In this paper we describe a single-node, double precision Field Programmable
Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the
context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we
invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three
Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250
accelerator and the largest device available on the market, the VU13P device.
In our implementation we separate software/hardware parts in such a way that
the entire multiplication by the Dirac operator is performed in hardware, and
the rest of the algorithm runs on the host. We find out that the FPGA
implementation can offer a performance comparable with that obtained using
current CPU or Intel's many core Xeon Phi accelerators. A possible multiple
node FPGA-based system is discussed and we argue that power-efficient High
Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
Implementation of the conjugate gradient algorithm on FPGA devices
Results of porting parts of the Lattice Quantum Chromodynamics code to modern
FPGA devices are presented. A single-node, double precision implementation of
the Conjugate Gradient algorithm is used to invert numerically the Dirac-Wilson
operator on a 4-dimensional grid on a Xilinx Zynq evaluation board. The code is
divided into two software/hardware parts in such a way that the entire
multiplication by the Dirac operator is performed in programmable logic, and
the rest of the algorithm runs on the ARM cores. Optimized data blocks are used
to efficiently use data movement infrastructure allowing to reach intervals of
1 clock cycle. We show that the FPGA implementation can offer a comparable
performance compared to that obtained using Intel Xeon Phi KNL.Comment: Proceedings of the 36th Annual International Symposium on Lattice
Field Theory - LATTICE201
Investigating the Dirac operator evaluation with FPGAs
In recent years the computational capacity of single Field Programmable Gate
Arrays (FPGA) devices as well as their versatility has increased significantly.
Adding to that the High Level Synthesis frameworks allowing to program such
processors in a high level language like C++, makes modern FPGA devices a
serious candidate as building blocks of a general purpose High Performance
Computing solution. In this contribution we describe benchmarks which we
performed using a Lattice QCD code, a highly compute-demanding HPC academic
code for elementary particle simulations. We benchmark the performance of a
single FPGA device running in two modes: using the external or embedded memory.
We discuss both approaches in detail using the Xilinx U250 device and provide
estimates for the necessary memory throughput and the minimal amount of
resources needed to deliver optimal performance depending on the available
hardware platform.Comment: 8 pages, 5 figure
Implementation of the conjugate gradient algorithm in Lattice QCD on FPGA devices
Results of porting parts of the Lattice Quantum Chromodynamics code to modern FPGA devices are presented. A single-node, double precision implementation of the Conjugate Gradient algorithm is used to invert numerically the Dirac-Wilson operator on a 4-dimensional grid on a Xilinx Zynq evaluation board. The code is divided into two software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in programmable logic, and the rest of the algorithm runs on the ARM cores. Optimized data blocks are used to efficiently use data movement infrastructure allowing to reach intervals of 1 clock cycle. We show that the FPGA implementation can offer a comparable performance compared to that obtained using Intel Xeon Phi KN