539 research outputs found
Lemon: an MPI parallel I/O library for data encapsulation using LIME
We introduce Lemon, an MPI parallel I/O library that is intended to allow for
efficient parallel I/O of both binary and metadata on massively parallel
architectures. Motivated by the demands of the Lattice Quantum Chromodynamics
community, the data is stored in the SciDAC Lattice QCD Interchange Message
Encapsulation format. This format allows for storing large blocks of binary
data and corresponding metadata in the same file. Even if designed for LQCD
needs, this format might be useful for any application with this type of data
profile. The design, implementation and application of Lemon are described. We
conclude with presenting the excellent scaling properties of Lemon on state of
the art high performance computers
The Static Quark-Antiquark Potential: A ``Classical'' Experiment On The Connection Machine CM-2
We describe the Wuppertal university pilot project in applied parallel
computing. We report on a comprehensive high statistics determination of the
static quark-antiquark potential and related quantities from quenched quantum
chromodynamics. New data for the string tension and the plaquette action for
the region 5.5 < beta < 6.8 is presented.Comment: (Talk K. Schilling), 11 pages, postscript (\approx 250K
Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on
the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the
numerical simulations of lattice QCD is a solver of linear equation for a large
sparse matrix that represents the strong interaction among quarks. To establish
widely applicable prescriptions, we examine rather general methods for the SIMD
architecture of KNL, such as using intrinsics and manual prefetching, to the
matrix multiplication and iterative solver algorithms. Based on the performance
measured on the Oakforest-PACS system, we discuss the performance tuning on KNL
as well as the code design for facilitating such tuning on SIMD architecture
and massively parallel machines.Comment: 8 pages, 12 figures. Talk given at LHAM'17 "5th International
Workshop on Legacy HPC Application Migration" in CANDAR'17 "The Fifth
International Symposium on Computing and Networking" and to appear in the
proceeding
FFT for the APE Parallel Computer
We present a parallel FFT algorithm for SIMD systems following the `Transpose
Algorithm' approach. The method is based on the assignment of the data field
onto a 1-dimensional ring of systolic cells. The systolic array can be
universally mapped onto any parallel system. In particular for systems with
next-neighbour connectivity our method has the potential to improve the
efficiency of matrix transposition by use of hyper-systolic communication. We
have realized a scalable parallel FFT on the APE100/Quadrics massively parallel
computer, where our implementation is part of a 2-dimensional hydrodynamics
code for turbulence studies. A possible generalization to 4-dimensional FFT is
presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include
Implementation of the conjugate gradient algorithm on FPGA devices
Results of porting parts of the Lattice Quantum Chromodynamics code to modern
FPGA devices are presented. A single-node, double precision implementation of
the Conjugate Gradient algorithm is used to invert numerically the Dirac-Wilson
operator on a 4-dimensional grid on a Xilinx Zynq evaluation board. The code is
divided into two software/hardware parts in such a way that the entire
multiplication by the Dirac operator is performed in programmable logic, and
the rest of the algorithm runs on the ARM cores. Optimized data blocks are used
to efficiently use data movement infrastructure allowing to reach intervals of
1 clock cycle. We show that the FPGA implementation can offer a comparable
performance compared to that obtained using Intel Xeon Phi KNL.Comment: Proceedings of the 36th Annual International Symposium on Lattice
Field Theory - LATTICE201
Towards Lattice Quantum Chromodynamics on FPGA devices
In this paper we describe a single-node, double precision Field Programmable
Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the
context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we
invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three
Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250
accelerator and the largest device available on the market, the VU13P device.
In our implementation we separate software/hardware parts in such a way that
the entire multiplication by the Dirac operator is performed in hardware, and
the rest of the algorithm runs on the host. We find out that the FPGA
implementation can offer a performance comparable with that obtained using
current CPU or Intel's many core Xeon Phi accelerators. A possible multiple
node FPGA-based system is discussed and we argue that power-efficient High
Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
The QPACE Supercomputer : Applications of Random Matrix Theory in Two-Colour Quantum Chromodynamics
QPACE is a massively parallel and scalable supercomputer designed to meet the requirements of applications in Lattice Quantum Chromodynamics. The project was carried out by several academic institutions in collaboration with IBM Germany and other industrial partners. In November 2009 and June 2010
QPACE was the leading architecture on the Green 500 list of the most energy efficient supercomputers in the world
- …