31,658 research outputs found
Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD
We study the feasibility of a PC-based parallel computer for medium to large
scale lattice QCD simulations. The E\"otv\"os Univ., Inst. Theor. Phys. cluster
consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single
precision sustained performance for dynamical QCD without communication is 1510
Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives
a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD,
respectively (for 64-bit applications the performance is approximately halved).
The novel feature of our system is its communication architecture. In order to
have a scalable, cost-effective machine we use Gigabit Ethernet cards for
nearest-neighbor communications in a two-dimensional mesh. This type of
communication is cost effective (only 30% of the hardware costs is spent on the
communication). According to our benchmark measurements this type of
communication results in around 40% communication time fraction for lattices
upto 48^3\cdot96 in full QCD simulations. The price/sustained-performance ratio
for full QCD is better than 1.5/Mflops for
staggered) quarks for practically any lattice size, which can fit in our
parallel computer. The communication software is freely available upon request
for non-profit organizations.Comment: 14 pages, 3 figures, final version to appear in Comp.Phys.Com
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
Sustaining a large fraction of single GPU performance in parallel
computations is considered to be the major problem of GPU-based clusters. In
this article, this topic is addressed in the context of a lattice Boltzmann
flow solver that is integrated in the WaLBerla software framework. We propose a
multi-GPU implementation using a block-structured MPI parallelization, suitable
for load balancing and heterogeneous computations on CPUs and GPUs. The
overhead required for multi-GPU simulations is discussed in detail and it is
demonstrated that the kernel performance can be sustained to a large extent.
With our GPU implementation, we achieve nearly perfect weak scalability on
InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less
efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost
analysis must determine the best course of action for a particular simulation
task. Additionally, weak scaling results of heterogeneous simulations conducted
on CPUs and GPUs simultaneously are presented using clusters equipped with
varying node configurations.Comment: 20 pages, 12 figure
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
GPU peer-to-peer techniques applied to a cluster interconnect
Modern GPUs support special protocols to exchange data directly across the
PCI Express bus. While these protocols could be used to reduce GPU data
transmission times, basically by avoiding staging to host memory, they require
specific hardware features which are not available on current generation
network adapters. In this paper we describe the architectural modifications
required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class
GPUs on an FPGA-based cluster interconnect. Besides, the current software
implementation, which integrates this feature by minimally extending the RDMA
programming model, is discussed, as well as some issues raised while employing
it in a higher level API like MPI. Finally, the current limits of the technique
are studied by analyzing the performance improvements on low-level benchmarks
and on two GPU-accelerated applications, showing when and how they seem to
benefit from the GPU peer-to-peer method.Comment: paper accepted to CASS 201
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
Energy efficiency is becoming increasingly important for computing systems,
in particular for large scale HPC facilities. In this work we evaluate, from an
user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS)
techniques, assisted by the power and energy monitoring capabilities of modern
processors in order to tune applications for energy efficiency. We run selected
kernels and a full HPC application on two high-end processors widely used in
the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate
the available trade-offs between energy-to-solution and time-to-solution,
attempting a function-by-function frequency tuning. We finally estimate the
benefits obtainable running the full code on a HPC multi-GPU node, with respect
to default clock frequency governors. We instrument our code to accurately
monitor power consumption and execution time without the need of any additional
hardware, and we enable it to change CPUs and GPUs clock frequencies while
running. We analyze our results on the different architectures using a simple
energy-performance model, and derive a number of energy saving strategies which
can be easily adopted on recent high-end HPC systems for generic applications
Marketing Percolation
A percolation model is presented, with computer simulations for
illustrations, to show how the sales of a new product may penetrate the
consumer market. We review the traditional approach in the marketing
literature, which is based on differential or difference equations similar to
the logistic equation (Bass 1969). This mean field approach is contrasted with
the discrete percolation on a lattice, with simulations of "social percolation"
(Solomon et al 2000) in two to five dimensions giving power laws instead of
exponential growth, and strong fluctuations right at the percolation threshold.Comment: to appear in Physica
- …