90,020 research outputs found
Towards Lattice Quantum Chromodynamics on FPGA devices
In this paper we describe a single-node, double precision Field Programmable
Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the
context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we
invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three
Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250
accelerator and the largest device available on the market, the VU13P device.
In our implementation we separate software/hardware parts in such a way that
the entire multiplication by the Dirac operator is performed in hardware, and
the rest of the algorithm runs on the host. We find out that the FPGA
implementation can offer a performance comparable with that obtained using
current CPU or Intel's many core Xeon Phi accelerators. A possible multiple
node FPGA-based system is discussed and we argue that power-efficient High
Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD
We study the feasibility of a PC-based parallel computer for medium to large
scale lattice QCD simulations. The E\"otv\"os Univ., Inst. Theor. Phys. cluster
consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single
precision sustained performance for dynamical QCD without communication is 1510
Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives
a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD,
respectively (for 64-bit applications the performance is approximately halved).
The novel feature of our system is its communication architecture. In order to
have a scalable, cost-effective machine we use Gigabit Ethernet cards for
nearest-neighbor communications in a two-dimensional mesh. This type of
communication is cost effective (only 30% of the hardware costs is spent on the
communication). According to our benchmark measurements this type of
communication results in around 40% communication time fraction for lattices
upto 48^3\cdot96 in full QCD simulations. The price/sustained-performance ratio
for full QCD is better than 1.5/Mflops for
staggered) quarks for practically any lattice size, which can fit in our
parallel computer. The communication software is freely available upon request
for non-profit organizations.Comment: 14 pages, 3 figures, final version to appear in Comp.Phys.Com
Beyond XSPEC: Towards Highly Configurable Analysis
We present a quantitative comparison between software features of the defacto
standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive
Spectral Interpretation System. Our emphasis is on customized analysis, with
ISIS offered as a strong example of configurable software. While noting that
XSPEC has been of immense value to astronomers, and that its scientific core is
moderately extensible--most commonly via the inclusion of user contributed
"local models"--we identify a series of limitations with its use beyond
conventional spectral modeling. We argue that from the viewpoint of the
astronomical user, the XSPEC internal structure presents a Black Box Problem,
with many of its important features hidden from the top-level interface, thus
discouraging user customization. Drawing from examples in custom modeling,
numerical analysis, parallel computation, visualization, data management, and
automated code generation, we show how a numerically scriptable, modular, and
extensible analysis platform such as ISIS facilitates many forms of advanced
astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages
End to end numerical simulations of the MAORY multiconjugate adaptive optics system
MAORY is the adaptive optics module of the E-ELT that will feed the MICADO
imaging camera through a gravity invariant exit port. MAORY has been foreseen
to implement MCAO correction through three high order deformable mirrors driven
by the reference signals of six Laser Guide Stars (LGSs) feeding as many
Shack-Hartmann Wavefront Sensors. A three Natural Guide Stars (NGSs) system
will provide the low order correction. We develop a code for the end-to-end
simulation of the MAORY adaptive optics (AO) system in order to obtain
high-delity modeling of the system performance. It is based on the IDL language
and makes extensively uses of the GPUs. Here we present the architecture of the
simulation tool and its achieved and expected performance.Comment: 8 pages, 4 figures, presented at SPIE Astronomical Telescopes +
Instrumentation 2014 in Montr\'eal, Quebec, Canada, with number 9148-25
ArrayBridge: Interweaving declarative array processing with high-performance computing
Scientists are increasingly turning to datacenter-scale computers to produce
and analyze massive arrays. Despite decades of database research that extols
the virtues of declarative query processing, scientists still write, debug and
parallelize imperative HPC kernels even for the most mundane queries. This
impedance mismatch has been partly attributed to the cumbersome data loading
process; in response, the database community has proposed in situ mechanisms to
access data in scientific file formats. Scientists, however, desire more than a
passive access method that reads arrays from files.
This paper describes ArrayBridge, a bi-directional array view mechanism for
scientific file formats, that aims to make declarative array manipulations
interoperable with imperative file-centric analyses. Our prototype
implementation of ArrayBridge uses HDF5 as the underlying array storage library
and seamlessly integrates into the SciDB open-source array database system. In
addition to fast querying over external array objects, ArrayBridge produces
arrays in the HDF5 file format just as easily as it can read from it.
ArrayBridge also supports time travel queries from imperative kernels through
the unmodified HDF5 API, and automatically deduplicates between array versions
for space efficiency. Our extensive performance evaluation in NERSC, a
large-scale scientific computing facility, shows that ArrayBridge exhibits
statistically indistinguishable performance and I/O scalability to the native
SciDB storage engine.Comment: 12 pages, 13 figure
- …