Search CORE

14,392 research outputs found

Gauge Field Generation on Large-Scale GPU-Enabled Systems

Author: Winter Frank
Publication venue
Publication date: 05/12/2012
Field of study

Over the past years GPUs have been successfully applied to the task of inverting the fermion matrix in lattice QCD calculations. Even strong scaling to capability-level supercomputers, corresponding to O(100) GPUs or more has been achieved. However strong scaling a whole gauge field generation algorithm to this regim requires significantly more functionality than just having the matrix inverter utilizing the GPUs and has not yet been accomplished. This contribution extends QDP-JIT, the migration of SciDAC QDP++ to GPU-enabled parallel systems, to help to strong scale the whole Hybrid Monte-Carlo to this regime. Initial results are shown for gauge field generation with Chroma simulating pure Wilson fermions on OLCF TitanDev.Comment: The 30th International Symposium on Lattice Field Theory, June 24-29, 2012, Cairns, Australia (Acknowledgment and Citation added

arXiv.org e-Print Archive

Crossref

Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU

Author: Fang Ye
Feng Sheng
Jarrell Mark
Moreno Juana
Ramanujam J.
Tam Ka-Ming
Yun Zhifeng
Publication venue: 'Elsevier BV'
Publication date: 21/11/2013
Field of study

Monte Carlo simulations of the Ising model play an important role in the field of computational statistical physics, and they have revealed many properties of the model over the past few decades. However, the effect of frustration due to random disorder, in particular the possible spin glass phase, remains a crucial but poorly understood problem. One of the obstacles in the Monte Carlo simulation of random frustrated systems is their long relaxation time making an efficient parallel implementation on state-of-the-art computation platforms highly desirable. The Graphics Processing Unit (GPU) is such a platform that provides an opportunity to significantly enhance the computational performance and thus gain new insight into this problem. In this paper, we present optimization and tuning approaches for the CUDA implementation of the spin glass simulation on GPUs. We discuss the integration of various design alternatives, such as GPU kernel construction with minimal communication, memory tiling, and look-up tables. We present a binary data format, Compact Asynchronous Multispin Coding (CAMSC), which provides an additional

28.4\%

speedup compared with the traditionally used Asynchronous Multispin Coding (AMSC). Our overall design sustains a performance of 33.5 picoseconds per spin flip attempt for simulating the three-dimensional Edwards-Anderson model with parallel tempering, which significantly improves the performance over existing GPU implementations.Comment: 15 pages, 18 figure

arXiv.org e-Print Archive

Crossref

Louisiana State University

Morphological diagram of diffusion driven aggregate growth in plane: competition of anisotropy and adhesion

Author: A.Yu. Menshutin
Andersson
Aukrust
Ball
Bunde
Chopard
Eckmann
Goldenfeld
Kesten
L.N. Shchur
Mandelbrot
Matsushita
Meakin
Menshutin
Menshutin
Menshutin
Meyer zu Heringdorf
Nittmann
Ogura
Praud
Saghi
Shibkov
Witten
Ziff
Publication venue: 'Elsevier BV'
Publication date: 20/08/2010
Field of study

Two-dimensional structures grown with Witten and Sander algorithm are investigated. We analyze clusters grown off-lattice and clusters grown with antenna method with

N_{fp}=3,4,5,6,7

and 8 allowed growth directions. With the help of variable probe particles technique we measure fractal dimension of such clusters

D(N)

as a function of their size

N

. We propose that in the thermodynamic limit of infinite cluster size the aggregates grown with high degree of anisotropy (

N_{fp}=3,4,5

) tend to have fractal dimension

D

equal to 3/2, while off-lattice aggregates and aggregates with lower anisotropy (

N_{fp}>6

) have

D \approx 1.710

. Noise-reduction procedure results in the change of universality class for DLA. For high enough noise-reduction value clusters with

N_{fp} \ge 6

have fractal dimension going to

3/2

when

N\rightarrow\infty

.Comment: 6 pages, 8 figures, conference CCP201

arXiv.org e-Print Archive

Crossref

A Verified Information-Flow Architecture

Author: Collins Nathan
de Amorim Arthur Azevedo
DeHon André
Demange Delphine
Hritcu Catalin
Pichardie David
Pierce Benjamin C.
Pollack Randy
Tolmach Andrew
Publication venue
Publication date: 01/01/2016
Field of study

SAFE is a clean-slate design for a highly secure computer system, with pervasive mechanisms for tracking and limiting information flows. At the lowest level, the SAFE hardware supports fine-grained programmable tags, with efficient and flexible propagation and combination of tags as instructions are executed. The operating system virtualizes these generic facilities to present an information-flow abstract machine that allows user programs to label sensitive data with rich confidentiality policies. We present a formal, machine-checked model of the key hardware and software mechanisms used to dynamically control information flow in SAFE and an end-to-end proof of noninterference for this model. We use a refinement proof methodology to propagate the noninterference property of the abstract machine down to the concrete machine level. We use an intermediate layer in the refinement chain that factors out the details of the information-flow control policy and devise a code generator for compiling such information-flow policies into low-level monitor code. Finally, we verify the correctness of this generator using a dedicated Hoare logic that abstracts from low-level machine instructions into a reusable set of verified structured code generators

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

PDXScholar (Portland State University)

HAL-Rennes 1

Matched filters for coalescing binaries detection on massively parallel computers

Author: A. Viceré
Allen
Bartoloni
Blanchet
Blanchet
Christ
Cutler
E. Calzavarini
F. Schifano
L. Sartori
Owen
Owen
R. Tripiccione
Rudiger
Tripiccione
Viceré
Publication venue: 'Elsevier BV'
Publication date: 18/07/2002
Field of study

We discuss some computational problems associated to matched filtering of experimental signals from gravitational wave interferometric detectors in a parallel-processing environment. We then specialize our discussion to the use of the APEmille and apeNEXT processors for this task. Finally, we accurately estimate the performance of an APEmille system on a computational load appropriate for the LIGO and VIRGO experiments, and extrapolate our results to apeNEXT.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Urbino

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

CERN Document Server

Towards Lattice Quantum Chromodynamics on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue: 'Elsevier BV'
Publication date: 04/12/2019
Field of study

In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

University of Regensburg Publication Server

Jagiellonian Univeristy Repository

Investigating the Dirac operator evaluation with FPGAs

Author: Korcyl G.
Korcyl P.
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2019
Field of study

In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such processors in a high level language like C++, makes modern FPGA devices a serious candidate as building blocks of a general purpose High Performance Computing solution. In this contribution we describe benchmarks which we performed using a Lattice QCD code, a highly compute-demanding HPC academic code for elementary particle simulations. We benchmark the performance of a single FPGA device running in two modes: using the external or embedded memory. We discuss both approaches in detail using the Xilinx U250 device and provide estimates for the necessary memory throughput and the minimal amount of resources needed to deliver optimal performance depending on the available hardware platform.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository