29,722 research outputs found
Optimization of Lattice QCD codes for the AMD Opteron processor
We report our experience of the optimization of the lattice QCD codes for the
new Opteron cluster at DESY Hamburg, including benchmarks. Details of the
optimization using SSE/SSE2 instructions and the effective use of prefetch
instructions are discussed.Comment: 5 pages, 4 figures, espcrc2.cls, Proceedings of X International
Workshop on Advanced Computing and Analysis Techniques in Physics Research
(ACAT 2005), DESY Zeuthen, Germany, May 22 - 27, 200
Radiation therapy calculations using an on-demand virtual cluster via cloud computing
Computer hardware costs are the limiting factor in producing highly accurate
radiation dose calculations on convenient time scales. Because of this,
large-scale, full Monte Carlo simulations and other resource intensive
algorithms are often considered infeasible for clinical settings. The emerging
cloud computing paradigm promises to fundamentally alter the economics of such
calculations by providing relatively cheap, on-demand, pay-as-you-go computing
resources over the Internet. We believe that cloud computing will usher in a
new era, in which very large scale calculations will be routinely performed by
clinics and researchers using cloud-based resources. In this research, several
proof-of-concept radiation therapy calculations were successfully performed on
a cloud-based virtual Monte Carlo cluster. Performance evaluations were made of
a distributed processing framework developed specifically for this project. The
expected 1/n performance was observed with some caveats. The economics of
cloud-based virtual computing clusters versus traditional in-house hardware is
also discussed. For most situations, cloud computing can provide a substantial
cost savings for distributed calculations.Comment: 12 pages, 4 figure
Towards Lattice Quantum Chromodynamics on FPGA devices
In this paper we describe a single-node, double precision Field Programmable
Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the
context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we
invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three
Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250
accelerator and the largest device available on the market, the VU13P device.
In our implementation we separate software/hardware parts in such a way that
the entire multiplication by the Dirac operator is performed in hardware, and
the rest of the algorithm runs on the host. We find out that the FPGA
implementation can offer a performance comparable with that obtained using
current CPU or Intel's many core Xeon Phi accelerators. A possible multiple
node FPGA-based system is discussed and we argue that power-efficient High
Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
- …