768 research outputs found
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
Recommended from our members
Implementation of a Particle Accelerator Beam Dynamics Code on Multi-Node GPUs
Heterogeneous CPU/GPU Memory Hierarchy Analysis and Optimization
In this master thesis, we propose a scheduling reordering for heterogeneous processors based on a hysteresis detector to give some fairness and speedup to the memory request threads taking advantage of the bank level parallelism at the memory system organization
Improving Mobile SOC\u27s Performance as an Energy Efficient DSP Platform with Heterogeneous Computing
Mobile system-on-chip (SOC) technology is improving at a staggering rate spurred primarily by the adoption of smartphones and tablets. This rapid innovation has allowed the mobile SOC to be considered in everything from high performance computing to embedded applications. In this work, modern SOC\u27s heterogeneous computing capabilities are evaluated with a focus toward digital signal processing (DSP). Evaluation is conducted on modern consumer devices running Android operating system and leveraging the relatively new RenderScript Compute to utilize CPU resources alongside other compute resources such as graphics processing units (GPUs) and digital signal processors. In order to benchmark these concepts, several implementations of both the discrete Fourier transform (DFT) and the fast Fourier transform (FFT) are tested across devices. The results show both improvement in performance and energy efficiency on many devices compared to traditional Java implementations and indicate that the mobile SOC is a relevant platform for DSP applications
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
The molecular dynamics simulation package GROMACS runs efficiently on a wide
variety of hardware from commodity workstations to high performance computing
clusters. Hardware features are well exploited with a combination of SIMD,
multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as
accelerators to compute interactions offloaded from the CPU. Here we evaluate
which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most
economical way. We have assembled and benchmarked compute nodes with various
CPU/GPU combinations to identify optimal compositions in terms of raw
trajectory production rate, performance-to-price ratio, energy efficiency, and
several other criteria. Though hardware prices are naturally subject to trends
and fluctuations, general tendencies are clearly visible. Adding any type of
GPU significantly boosts a node's simulation performance. For inexpensive
consumer-class GPUs this improvement equally reflects in the
performance-to-price ratio. Although memory issues in consumer-class GPUs could
pass unnoticed since these cards do not support ECC memory, unreliable GPUs can
be sorted out with memory checking tools. Apart from the obvious determinants
for cost-efficiency like hardware expenses and raw performance, the energy
consumption of a node is a major cost factor. Over the typical hardware
lifetime until replacement of a few years, the costs for electrical power and
cooling can become larger than the costs of the hardware itself. Taking that
into account, nodes with a well-balanced ratio of CPU and consumer-class GPU
resources produce the maximum amount of GROMACS trajectory over their lifetime
- …