90,020 research outputs found

    Towards Lattice Quantum Chromodynamics on FPGA devices

    Get PDF
    In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

    Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

    Full text link
    We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The E\"otv\"os Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48^3\cdot96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than 1/MflopsforWilson(andaround1/Mflops for Wilson (and around 1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.Comment: 14 pages, 3 figures, final version to appear in Comp.Phys.Com

    Beyond XSPEC: Towards Highly Configurable Analysis

    Full text link
    We present a quantitative comparison between software features of the defacto standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive Spectral Interpretation System. Our emphasis is on customized analysis, with ISIS offered as a strong example of configurable software. While noting that XSPEC has been of immense value to astronomers, and that its scientific core is moderately extensible--most commonly via the inclusion of user contributed "local models"--we identify a series of limitations with its use beyond conventional spectral modeling. We argue that from the viewpoint of the astronomical user, the XSPEC internal structure presents a Black Box Problem, with many of its important features hidden from the top-level interface, thus discouraging user customization. Drawing from examples in custom modeling, numerical analysis, parallel computation, visualization, data management, and automated code generation, we show how a numerically scriptable, modular, and extensible analysis platform such as ISIS facilitates many forms of advanced astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages

    End to end numerical simulations of the MAORY multiconjugate adaptive optics system

    Full text link
    MAORY is the adaptive optics module of the E-ELT that will feed the MICADO imaging camera through a gravity invariant exit port. MAORY has been foreseen to implement MCAO correction through three high order deformable mirrors driven by the reference signals of six Laser Guide Stars (LGSs) feeding as many Shack-Hartmann Wavefront Sensors. A three Natural Guide Stars (NGSs) system will provide the low order correction. We develop a code for the end-to-end simulation of the MAORY adaptive optics (AO) system in order to obtain high-delity modeling of the system performance. It is based on the IDL language and makes extensively uses of the GPUs. Here we present the architecture of the simulation tool and its achieved and expected performance.Comment: 8 pages, 4 figures, presented at SPIE Astronomical Telescopes + Instrumentation 2014 in Montr\'eal, Quebec, Canada, with number 9148-25

    ArrayBridge: Interweaving declarative array processing with high-performance computing

    Full text link
    Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.Comment: 12 pages, 13 figure
    corecore