5,808 research outputs found

    Continuing Progress on a Lattice QCD Software Infrastructure

    Full text link
    We report on the progress of the software effort in the QCD Application Area of SciDAC. In particular, we discuss how the software developed under SciDAC enabled the aggressive exploitation of leadership computers, and we report on progress in the area of QCD software for multi-core architectures.Comment: 5 Pages, to appear in the Proceedings of SciDAC 2008 conference, (Seattle, July 13-17, 2008), Conference Poster Presentation Proceeding

    MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    Full text link
    With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library

    QPACE 2 and Domain Decomposition on the Intel Xeon Phi

    Get PDF
    We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number of benchmarks.Comment: plenary talk at Lattice 2014, to appear in the conference proceedings PoS(LATTICE2014), 15 pages, 9 figure

    GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP

    Full text link
    Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&D, as well as the conclusions and lessons learnt from the beta prototype.Comment: 34 pages, 26 figures, 24 table

    Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

    Full text link
    The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well exploited with a combination of SIMD, multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as accelerators to compute interactions offloaded from the CPU. Here we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Though hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed since these cards do not support ECC memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime

    Parallel performance results for the OpenMOC neutron transport code on multicore platforms

    Get PDF
    The shift toward multicore architectures has ushered in a new era of shared memory parallelism for scientific applications. This transition has introduced challenges for the nuclear engineering community, as it seeks to design high-fidelity full-core reactor physics simulation tools. This article describes the parallel transport sweep algorithm in the OpenMOC method of characteristics (MOC) neutron transport code for multicore platforms using OpenMP. Strong and weak scaling studies are performed for both Intel Xeon and IBM Blue Gene/Q (BG/Q) multicore processors. The results demonstrate 100% parallel efficiency for 12 threads on 12 cores on Intel Xeon platforms and over 90% parallel efficiency with 64 threads on 16 cores on the IBM BG/Q. These results illustrate the potential for hardware acceleration for MOC neutron transport on modern multicore and future many-core architectures. In addition, this work highlights the pitfalls of programming for multicore architectures, with a focal point on false sharing.National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant 1122374)United States. Department of Energy (Center for Exascale Simulation of Advanced Reactors. Contract DE-AC02-06CH11357
    corecore