29,722 research outputs found

    Optimization of Lattice QCD codes for the AMD Opteron processor

    Full text link
    We report our experience of the optimization of the lattice QCD codes for the new Opteron cluster at DESY Hamburg, including benchmarks. Details of the optimization using SSE/SSE2 instructions and the effective use of prefetch instructions are discussed.Comment: 5 pages, 4 figures, espcrc2.cls, Proceedings of X International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2005), DESY Zeuthen, Germany, May 22 - 27, 200

    Radiation therapy calculations using an on-demand virtual cluster via cloud computing

    Full text link
    Computer hardware costs are the limiting factor in producing highly accurate radiation dose calculations on convenient time scales. Because of this, large-scale, full Monte Carlo simulations and other resource intensive algorithms are often considered infeasible for clinical settings. The emerging cloud computing paradigm promises to fundamentally alter the economics of such calculations by providing relatively cheap, on-demand, pay-as-you-go computing resources over the Internet. We believe that cloud computing will usher in a new era, in which very large scale calculations will be routinely performed by clinics and researchers using cloud-based resources. In this research, several proof-of-concept radiation therapy calculations were successfully performed on a cloud-based virtual Monte Carlo cluster. Performance evaluations were made of a distributed processing framework developed specifically for this project. The expected 1/n performance was observed with some caveats. The economics of cloud-based virtual computing clusters versus traditional in-house hardware is also discussed. For most situations, cloud computing can provide a substantial cost savings for distributed calculations.Comment: 12 pages, 4 figure

    Towards Lattice Quantum Chromodynamics on FPGA devices

    Get PDF
    In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
    corecore