Search CORE

539 research outputs found

Lemon: an MPI parallel I/O library for data encapsulation using LIME

Author: Albert Deuzeman
Amdahl
Baron
Carsten Urbach
Davies
Deuzeman
Jansen
Siebren Reker
Publication venue: 'Elsevier BV'
Publication date: 21/06/2011
Field of study

We introduce Lemon, an MPI parallel I/O library that is intended to allow for efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the Lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message Encapsulation format. This format allows for storing large blocks of binary data and corresponding metadata in the same file. Even if designed for LQCD needs, this format might be useful for any application with this type of data profile. The design, implementation and application of Lemon are described. We conclude with presenting the excellent scaling properties of Lemon on state of the art high performance computers

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Bern Open Repository and Information System (BORIS)

Dissertations of the University of Groningen

The Static Quark-Antiquark Potential: A ``Classical'' Experiment On The Connection Machine CM-2

Author: Bali G. S.
Schilling K.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1993
Field of study

We describe the Wuppertal university pilot project in applied parallel computing. We report on a comprehensive high statistics determination of the static quark-antiquark potential and related quantities from quenched quantum chromodynamics. New data for the string tension and the plaquette action for the region 5.5 < beta < 6.8 is presented.Comment: (Talk K. Schilling), 11 pages, postscript (\approx 250K

arXiv.org e-Print Archive

Crossref

CERN Document Server

Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

Author: Kanamori Issaku
Matsufuru Hideo
Publication venue
Publication date: 05/12/2017
Field of study

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of KNL, such as using intrinsics and manual prefetching, to the matrix multiplication and iterative solver algorithms. Based on the performance measured on the Oakforest-PACS system, we discuss the performance tuning on KNL as well as the code design for facilitating such tuning on SIMD architecture and massively parallel machines.Comment: 8 pages, 12 figures. Talk given at LHAM'17 "5th International Workshop on Legacy HPC Application Migration" in CANDAR'17 "The Fifth International Symposium on Computing and Networking" and to appear in the proceeding

arXiv.org e-Print Archive

Crossref

FFT for the APE Parallel Computer

Author: Davies C. T. H.
Federico Toschi
Katz G.
Klaus Schilling
Lippert Th.
Raffaele Tripiccione
Sven Trentmann
Thomas Lippert
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/1997
Field of study

We present a parallel FFT algorithm for SIMD systems following the `Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbour connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. A possible generalization to 4-dimensional FFT is presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Juelich Shared Electronic Resources

CERN Document Server

Implementation of the conjugate gradient algorithm on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue
Publication date: 08/11/2018
Field of study

Results of porting parts of the Lattice Quantum Chromodynamics code to modern FPGA devices are presented. A single-node, double precision implementation of the Conjugate Gradient algorithm is used to invert numerically the Dirac-Wilson operator on a 4-dimensional grid on a Xilinx Zynq evaluation board. The code is divided into two software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in programmable logic, and the rest of the algorithm runs on the ARM cores. Optimized data blocks are used to efficiently use data movement infrastructure allowing to reach intervals of 1 clock cycle. We show that the FPGA implementation can offer a comparable performance compared to that obtained using Intel Xeon Phi KNL.Comment: Proceedings of the 36th Annual International Symposium on Lattice Field Theory - LATTICE201

arXiv.org e-Print Archive

Crossref

Jagiellonian Univeristy Repository

Towards Lattice Quantum Chromodynamics on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue: 'Elsevier BV'
Publication date: 04/12/2019
Field of study

In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

University of Regensburg Publication Server

Jagiellonian Univeristy Repository

The QPACE Supercomputer : Applications of Random Matrix Theory in Two-Colour Quantum Chromodynamics

Author: Meyer Nils
Publication venue
Publication date: 15/06/2016
Field of study

QPACE is a massively parallel and scalable supercomputer designed to meet the requirements of applications in Lattice Quantum Chromodynamics. The project was carried out by several academic institutions in collaboration with IBM Germany and other industrial partners. In November 2009 and June 2010 QPACE was the leading architecture on the Green 500 list of the most energy efficient supercomputers in the world

University of Regensburg Publication Server