Search CORE

541 research outputs found

The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform

Author: Casas Marc
Labarta Mancho Jesús José
Mantovani Filippo
Ruiz Daniel
Spiga Filippo
Publication venue
Publication date: 01/01/2018
Field of study

The High-Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High-Performance Computing (HPC) systems. Due to its lower arithmetic intensity and higher memory pressure, HPCG is recognized as a more representative benchmark for data-center and irregular memory access pattern workloads, therefore its popularity and acceptance is raising within the HPC community. As only a small fraction of the reference version of the HPCG benchmark is parallelized with shared memory techniques (OpenMP), we introduce in this report two OpenMP parallelization methods. Due to the increasing importance of Arm architecture in the HPC scenario, we evaluate our HPCG code at scale on a state-of-the-art HPC system based on Cavium ThunderX2 SoC. We consider our work as a contribution to the Arm ecosystem: along with this technical report, we plan in fact to release our code for boosting the tuning of the HPCG benchmark within the Arm community.Postprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

Author: Christ Norman H.
Detmold William
Edwards Robert G.
Joó Bálint
Jung Chulwoo
Savage Martin
Shanahan Phiala
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2019
Field of study

In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers

Author: Flower J. W.
Fox G. C.
Ikudome K.
Kolawa A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/1990
Field of study

This paper describes ASPAR (Automatic and Symbolic PARallelization) which consists of a source-to-source parallelizer and a set of interactive graphic tools. While the issues of data dependency have already been explored and used in many parallel computer systems such as vector and shared memory machines, distributed memory parallel computers require, in addition, explicit data decomposition. New symbolic analysis and data-dependency analysis methods are used to determine an explicit data decomposition scheme. Automatic parallelization models using high level communications are also described in this paper. The target applications are of the “regular-mesh" type typical of many scientific calculations. The system has been implemented for the language C, and is designed for easy modification for other languages such as Fortran

Caltech Authors

Towards Lattice Quantum Chromodynamics on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue: 'Elsevier BV'
Publication date: 04/12/2019
Field of study

In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

University of Regensburg Publication Server

Jagiellonian Univeristy Repository

Polyhedral+Dataflow Graphs

Author: Davis Eddie C.
Publication venue: 'IUScholarWorks'
Publication date: 01/05/2020
Field of study

This research presents an intermediate compiler representation that is designed for optimization, and emphasizes the temporary storage requirements and execution schedule of a given computation to guide optimization decisions. The representation is expressed as a dataflow graph that describes computational statements and data mappings within the polyhedral compilation model. The targeted applications include both the regular and irregular scientific domains. The intermediate representation can be integrated into existing compiler infrastructures. A specification language implemented as a domain specific language in C++ describes the graph components and the transformations that can be applied. The visual representation allows users to reason about optimizations. Graph variants can be translated into source code or other representation. The language, intermediate representation, and associated transformations have been applied to improve the performance of differential equation solvers, or sparse matrix operations, tensor decomposition, and structured multigrid methods

Boise State University - ScholarWorks