5,318 research outputs found

    General Algorithm For Improved Lattice Actions on Parallel Computing Architectures

    Get PDF
    Quantum field theories underlie all of our understanding of the fundamental forces of nature. The are relatively few first principles approaches to the study of quantum field theories [such as quantum chromodynamics (QCD) relevant to the strong interaction] away from the perturbative (i.e., weak-coupling) regime. Currently the most common method is the use of Monte Carlo methods on a hypercubic space-time lattice. These methods consume enormous computing power for large lattices and it is essential that increasingly efficient algorithms be developed to perform standard tasks in these lattice calculations. Here we present a general algorithm for QCD that allows one to put any planar improved gluonic lattice action onto a parallel computing architecture. High performance masks for specific actions (including non-planar actions) are also presented. These algorithms have been successfully employed by us in a variety of lattice QCD calculations using improved lattice actions on a 128 node Thinking Machines CM-5. {\underline{Keywords}}: quantum field theory; quantum chromodynamics; improved actions; parallel computing algorithms

    Lattice QCD Production on Commodity Clusters at Fermilab

    Full text link
    We describe the construction and results to date of Fermilab's three Myrinet-networked lattice QCD production clusters (an 80-node dual Pentium III cluster, a 48-node dual Xeon cluster, and a 128-node dual Xeon cluster). We examine a number of aspects of performance of the MILC lattice QCD code running on these clusters.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 6 pages, LaTeX, 8 eps figures. PSN TUIT00

    Job Management and Task Bundling

    Full text link
    High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi_jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.Comment: 8 pages, 3 figures, LATTICE 2017 proceeding

    HotQCD on Multi-GPU Systems

    Full text link
    We present SIMULATeQCD\texttt{SIMULATeQCD}, HotQCD's software for performing lattice QCD calculations on GPUs. Started in late 2017 and intended as a full replacement of the previous single GPU lattice QCD code used by the HotQCD collaboration, our software has been developed into an extensive framework for lattice QCD calculations distributed on multiple GPUs over many compute nodes. The code is built on C++, CUDA, and MPI and leverages modern C++ language features to provide high-level data structures, objects, and algorithms that allow users to express lattice QCD calculations in an intuitive way without sacrificing performance. Implemented algorithms range from gradient flow, correlator measurements, and mixed precision conjugate gradient solvers all the way to full HISQ gauge field configuration generation using RHMC. After successful deployment in large-scale computing projects, we want to share the result of our efforts with the lattice QCD community by making it publicly available. In these proceedings, we will present some of the key features of our code, demonstrate its ease of use, and show benchmarks of performance critical kernels on state-of-the-art supercomputers.Comment: 7 pages, 3 figures, presented at the 38th International Symposium on Lattice Field Theor

    Investigating the Dirac operator evaluation with FPGAs

    Get PDF
    In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such processors in a high level language like C++, makes modern FPGA devices a serious candidate as building blocks of a general purpose High Performance Computing solution. In this contribution we describe benchmarks which we performed using a Lattice QCD code, a highly compute-demanding HPC academic code for elementary particle simulations. We benchmark the performance of a single FPGA device running in two modes: using the external or embedded memory. We discuss both approaches in detail using the Xilinx U250 device and provide estimates for the necessary memory throughput and the minimal amount of resources needed to deliver optimal performance depending on the available hardware platform.Comment: 8 pages, 5 figure
    corecore