Search CORE

31,658 research outputs found

Better than $1/Mflops sustained: a scalable PC-based parallel computer for lattice QCD

Author: Aglietti
Alfieri
Alfieri
Aoki
Arsenin
Avico
Bartoloni
Bartonoli
Bodin
Boyle
Chen
Christ
Csikor
Csikor
Di Pierro
Eicker
Fodor
Fodor
Gottlieb
Gottlieb
Gottlieb
Gábor Papp
Iwasaki
Iwasaki
Lüscher
Sándor D. Katz
Tripiccione
Ukawa
Yoshie
Zoltán Fodor
Publication venue: 'Elsevier BV'
Publication date: 21/05/2003
Field of study

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The E\"otv\"os Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48^3\cdot96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than

1/Mflops for Wilson (and around

1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.Comment: 14 pages, 3 figures, final version to appear in Comp.Phys.Com

arXiv.org e-Print Archive

Crossref

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

Author: Aidun
Bernaschi
Boehm
Chen
Christian Feichtinger
Donath
Dünweg
Feichtinger
Freudiger
Gamma
Georg Hager
Gerhard Wellein
Götz
Harald Köstler
He
Johannes Habich
Köstler
Succi
Tölke
Ulrich Rüde
van Vliet
Wellein
Yu
Zeiser
Publication venue: 'Elsevier BV'
Publication date: 08/07/2010
Field of study

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail and it is demonstrated that the kernel performance can be sustained to a large extent. With our GPU implementation, we achieve nearly perfect weak scalability on InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task. Additionally, weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously are presented using clusters equipped with varying node configurations.Comment: 20 pages, 12 figure

arXiv.org e-Print Archive

Crossref

Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

Author: Christ Norman H.
Detmold William
Edwards Robert G.
Joó Bálint
Jung Chulwoo
Savage Martin
Shanahan Phiala
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2019
Field of study

In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

GPU peer-to-peer techniques applied to a cluster interconnect

Author: Ammendola Roberto
Bernaschi Massimo
Biagioni Andrea
Bisson Mauro
Cicero Francesca Lo
Fatica Massimiliano
Frezza Ottorino
Lonardo Alessandro
Mastrostefano Enrico
Paolucci Pier Stanislao
Rossetti Davide
Simula Francesco
Tosoratto Laura
Vicini Piero
Publication venue
Publication date: 31/07/2013
Field of study

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.Comment: paper accepted to CASS 201

arXiv.org e-Print Archive

Crossref

Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

Author: Biferale
Biferale
Biferale
Calore
Calore
Calore
Calore
Calore
Crimi
Dick
Etinski
Ge
Khabi
Lim
Mantovani
Mazouz
Peraza
Sbragaglia
Scagliarini
Succi
Sundriyal
Williams
Wittmann
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

Energy efficiency is becoming increasingly important for computing systems, in particular for large scale HPC facilities. In this work we evaluate, from an user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS) techniques, assisted by the power and energy monitoring capabilities of modern processors in order to tune applications for energy efficiency. We run selected kernels and a full HPC application on two high-end processors widely used in the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate the available trade-offs between energy-to-solution and time-to-solution, attempting a function-by-function frequency tuning. We finally estimate the benefits obtainable running the full code on a HPC multi-GPU node, with respect to default clock frequency governors. We instrument our code to accurately monitor power consumption and execution time without the need of any additional hardware, and we enable it to change CPUs and GPUs clock frequencies while running. We analyze our results on the different architectures using a simple energy-performance model, and derive a number of energy saving strategies which can be easily adopted on recent high-end HPC systems for generic applications

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Marketing Percolation

Author: B Libai
Bass
Bass
Chatterjee
Coleman
D Stauffer
Evertz
Huang
J Goldenberg
Lancaster
Mahajan
Mahajan
N Jan
Parker
Rogers
Rogers
Ryan
S Solomon
Sahimi
Solomon
Stauffer
Sultan
Publication venue: 'Elsevier BV'
Publication date: 24/05/2000
Field of study

A percolation model is presented, with computer simulations for illustrations, to show how the sales of a new product may penetrate the consumer market. We review the traditional approach in the marketing literature, which is based on differential or difference equations similar to the logistic equation (Bass 1969). This mean field approach is contrasted with the discrete percolation on a lattice, with simulations of "social percolation" (Solomon et al 2000) in two to five dimensions giving power laws instead of exponential growth, and strong fluctuations right at the percolation threshold.Comment: to appear in Physica

arXiv.org e-Print Archive

Crossref