Search CORE

90,020 research outputs found

Towards Lattice Quantum Chromodynamics on FPGA devices

Author: Korcyl Grzegorz
Korcyl Piotr
Publication venue: 'Elsevier BV'
Publication date: 04/12/2019
Field of study

In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

University of Regensburg Publication Server

Jagiellonian Univeristy Repository

Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

Author: Aglietti
Alfieri
Alfieri
Aoki
Arsenin
Avico
Bartoloni
Bartonoli
Bodin
Boyle
Chen
Christ
Csikor
Csikor
Di Pierro
Eicker
Fodor
Fodor
Gottlieb
Gottlieb
Gottlieb
Gábor Papp
Iwasaki
Iwasaki
Lüscher
Sándor D. Katz
Tripiccione
Ukawa
Yoshie
Zoltán Fodor
Publication venue: 'Elsevier BV'
Publication date: 21/05/2003
Field of study

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The E\"otv\"os Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48^3\cdot96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than

1/Mflops for Wilson (and around

1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.Comment: 14 pages, 3 figures, final version to appear in Comp.Phys.Com

arXiv.org e-Print Archive

Crossref

Beyond XSPEC: Towards Highly Configurable Analysis

Author: Collins H. M.
Gilat A.
Gropp W.
Houck J. C.
Kiczales G.
M. A. Nowak
M. S. Noble
Publication venue: 'University of Chicago Press'
Publication date: 03/06/2008
Field of study

We present a quantitative comparison between software features of the defacto standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive Spectral Interpretation System. Our emphasis is on customized analysis, with ISIS offered as a strong example of configurable software. While noting that XSPEC has been of immense value to astronomers, and that its scientific core is moderately extensible--most commonly via the inclusion of user contributed "local models"--we identify a series of limitations with its use beyond conventional spectral modeling. We argue that from the viewpoint of the astronomical user, the XSPEC internal structure presents a Black Box Problem, with many of its important features hidden from the top-level interface, thus discouraging user customization. Drawing from examples in custom modeling, numerical analysis, parallel computation, visualization, data management, and automated code generation, we show how a numerically scriptable, modular, and extensible analysis platform such as ISIS facilitates many forms of advanced astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages

arXiv.org e-Print Archive

Crossref

End to end numerical simulations of the MAORY multiconjugate adaptive optics system

Author: Arcidiacono Carmelo
Bregoli Giovanni
Butler R. C.
Ciliegi Paolo
Cosentino Giuseppe
Diolaiti Emiliano
Foppiani Italo
Lombini Matteo
Schreiber Laura
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2014
Field of study

MAORY is the adaptive optics module of the E-ELT that will feed the MICADO imaging camera through a gravity invariant exit port. MAORY has been foreseen to implement MCAO correction through three high order deformable mirrors driven by the reference signals of six Laser Guide Stars (LGSs) feeding as many Shack-Hartmann Wavefront Sensors. A three Natural Guide Stars (NGSs) system will provide the low order correction. We develop a code for the end-to-end simulation of the MAORY adaptive optics (AO) system in order to obtain high-delity modeling of the system performance. It is based on the IDL language and makes extensively uses of the GPUs. Here we present the architecture of the simulation tool and its achieved and expected performance.Comment: 8 pages, 4 figures, presented at SPIE Astronomical Telescopes + Instrumentation 2014 in Montr\'eal, Quebec, Canada, with number 9148-25

arXiv.org e-Print Archive

OA@INAF - Istituto Nazionale di Astrofisica

ArrayBridge: Interweaving declarative array processing with high-performance computing

Author: Blanas Spyros
Brown Paul
Byna Suren
Floratos Sofoklis
Prabhat
Wu Kesheng
Xing Haoyuan
Publication venue
Publication date: 01/01/2017
Field of study

Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.Comment: 12 pages, 13 figure

arXiv.org e-Print Archive

eScholarship - University of California