11,643 research outputs found
Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD
We study the feasibility of a PC-based parallel computer for medium to large
scale lattice QCD simulations. The E\"otv\"os Univ., Inst. Theor. Phys. cluster
consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single
precision sustained performance for dynamical QCD without communication is 1510
Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives
a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD,
respectively (for 64-bit applications the performance is approximately halved).
The novel feature of our system is its communication architecture. In order to
have a scalable, cost-effective machine we use Gigabit Ethernet cards for
nearest-neighbor communications in a two-dimensional mesh. This type of
communication is cost effective (only 30% of the hardware costs is spent on the
communication). According to our benchmark measurements this type of
communication results in around 40% communication time fraction for lattices
upto 48^3\cdot96 in full QCD simulations. The price/sustained-performance ratio
for full QCD is better than 1.5/Mflops for
staggered) quarks for practically any lattice size, which can fit in our
parallel computer. The communication software is freely available upon request
for non-profit organizations.Comment: 14 pages, 3 figures, final version to appear in Comp.Phys.Com
QCD simulations with staggered fermions on GPUs
We report on our implementation of the RHMC algorithm for the simulation of
lattice QCD with two staggered flavors on Graphics Processing Units, using the
NVIDIA CUDA programming language. The main feature of our code is that the GPU
is not used just as an accelerator, but instead the whole Molecular Dynamics
trajectory is performed on it. After pointing out the main bottlenecks and how
to circumvent them, we discuss the obtained performances. We present some
preliminary results regarding OpenCL and multiGPU extensions of our code and
discuss future perspectives.Comment: 22 pages, 14 eps figures, final version to be published in Computer
Physics Communication
A scalable PC-based parallel computer for lattice QCD
A PC-based parallel computer for medium/large scale lattice QCD simulations
is suggested. The Eotvos Univ., Inst. Theor. Phys. cluster consists of 137
Intel P4-1.7GHz nodes. Gigabit Ethernet cards are used for nearest neighbor
communication in a two-dimensional mesh. The sustained performance for
dynamical staggered(wilson) quarks on large lattices is around 70(110) GFlops.
The exceptional price/performance ratio is below $1/Mflop.Comment: 3 pages, 2 figures, Lattice2002(machines
Simulating spin models on GPU
Over the last couple of years it has been realized that the vast
computational power of graphics processing units (GPUs) could be harvested for
purposes other than the video game industry. This power, which at least
nominally exceeds that of current CPUs by large factors, results from the
relative simplicity of the GPU architectures as compared to CPUs, combined with
a large number of parallel processing units on a single chip. To benefit from
this setup for general computing purposes, the problems at hand need to be
prepared in a way to profit from the inherent parallelism and hierarchical
structure of memory accesses. In this contribution I discuss the performance
potential for simulating spin models, such as the Ising model, on GPU as
compared to conventional simulations on CPU.Comment: 5 pages, 4 figures, elsarticl
- …