Search CORE

25,694 research outputs found

GPU peer-to-peer techniques applied to a cluster interconnect

Author: Ammendola Roberto
Bernaschi Massimo
Biagioni Andrea
Bisson Mauro
Cicero Francesca Lo
Fatica Massimiliano
Frezza Ottorino
Lonardo Alessandro
Mastrostefano Enrico
Paolucci Pier Stanislao
Rossetti Davide
Simula Francesco
Tosoratto Laura
Vicini Piero
Publication venue
Publication date: 31/07/2013
Field of study

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.Comment: paper accepted to CASS 201

arXiv.org e-Print Archive

Crossref

A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

Author: Nabi Syed Waqar
Vanderbauwhede Wim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2016
Field of study

Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations for generating correct-by-construction design variants, and an associated light-weight cost model for evaluating these variants for implementation on FPGAs. In this paper we present a key enabler of our approach, the cost model. We discuss how we are able to quickly derive accurate estimates of performance and resource-utilization from the design’s representation in our intermediate language. We show results confirming the accuracy of our cost model by testing it on three different scientific kernels. We conclude with a case-study that compares a solution generated by our framework with one from a conventional high-level synthesis tool, showing better performance and power-efficiency using our cost model based approach

Crossref

Enlighten

MPWide: a light-weight library for efficient message passing over wide area networks

Author: Groen Derek
Rieder Steven
Zwart Simon Portegies
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 01/12/2013
Field of study

We present MPWide, a light weight communication library which allows efficient message passing over a distributed network. MPWide has been designed to connect application running on distributed (super)computing resources, and to maximize the communication performance on wide area networks for those without administrative privileges. It can be used to provide message-passing between application, move files, and make very fast connections in client-server environments. MPWide has already been applied to enable distributed cosmological simulations across up to four supercomputers on two continents, and to couple two different bloodflow simulations to form a multiscale simulation.Comment: accepted by the Journal Of Open Research Software, 13 pages, 4 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

The STAR MAPS-based PiXeL detector

Author: A. Bulatov
A.A. Bulatov
B. Larose
B. Larose
B. Martin
D. Meier
D.A. Cohen
F. Börner
H. Chen
H. Chen
H. Chen
H. Chen
H. Chen
J. Berman
J. Wiegold
J. Wiegold
J. Wiegold
J. Wiegold
J. Wiegold
M. Bodirsky
M. Bodirsky
M. Quick
P. Idziak
P. Jeavons
P. Jeavons
P. Jonsson
P.G. Jeavons
S. Bova
T. Feder
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

The PiXeL detector (PXL) for the Heavy Flavor Tracker (HFT) of the STAR experiment at RHIC is the first application of the state-of-the-art thin Monolithic Active Pixel Sensors (MAPS) technology in a collider environment. Custom built pixel sensors, their readout electronics and the detector mechanical structure are described in detail. Selected detector design aspects and production steps are presented. The detector operations during the three years of data taking (2014-2016) and the overall performance exceeding the design specifications are discussed in the conclusive sections of this paper

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

University of Liverpool Repository

HAL-IN2P3

Crossref

Strong scaling of general-purpose molecular dynamics simulations on GPUs

Author: Anderson Joshua A.
Glaser Jens
Glotzer Sharon C.
Lui Pak
Millan Jaime A.
Morse David C.
Nguyen Trung Dac
Spiga Filippo
Publication venue: 'Elsevier BV'
Publication date: 10/12/2014
Field of study

We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5x.Comment: 30 pages, 14 figure

arXiv.org e-Print Archive

CiteSeerX

The End of Slow Networks: It's Time for a Redesign

Author: Binnig Carsten
Crotty Andrew
Galakatos Alex
Kraska Tim
Zamanian Erfan
Publication venue
Publication date: 19/12/2015
Field of study

Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly designed and optimized under the assumption that the network is the bottleneck: the network is slow and "thin", and thus needs to be avoided as much as possible. Yet this assumption no longer holds true. With InfiniBand FDR 4x, the bandwidth available to transfer data across network is in the same ballpark as the bandwidth of one memory channel, and it increases even further with the most recent EDR standard. Moreover, with the increasing advances of RDMA, the latency improves similarly fast. In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network. Second, we propose architectural redesigns for OLTP, OLAP and advanced analytical frameworks to take better advantage of the improved bandwidth, latency and RDMA capabilities. Finally, for each of the workload categories, we show that remarkable performance improvements can be achieved

arXiv.org e-Print Archive

TUbiblio