Search CORE

2 research outputs found

Enabling Bitwise reproducibility for the unstructured computational motif

Author: Mudalige Gihan R.
Reguly István Z.
Siklósi Bálint
Publication venue: MDPI
Publication date: 11/01/2024
Field of study

Warwick Research Archives Portal Repository

NIC-based Reduction Algorithms for Large-scale Clusters

Author: Adam Moody
Dhabaleswar K. Panda
Eitan Frachtenberg
Fabrizio Petrini
Juan Fernandez
Publication venue
Publication date: 30/07/2004
Field of study

Abstract — Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous algorithms limit processing to the host CPU, we utilize the programmable processors and local memory available on modern cluster network interface cards (NICs) to explore a new dimension in the design of reduction algorithms. In this paper, we present the benefits and challenges, design issues and solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Performance and scalability evaluations were conducted on the ASCI Linux Cluster (ALC), a 960-node, 1920-processor machine at Lawrence Livermore National Laboratory, which uses the Quadrics QsNet interconnect. We find NIC-based reductions on modern interconnects to be more efficient than host-based implementations in both scalability and consistency. In particular, at large-scale—1812 processes— NIC-based reductions of small integer and floating-point arrays provided respective speedups of 121 % and 39% over the host-based, production-level MPI implementation. In addition, the standard deviations in timings for the NICbased reductions were as much as two orders of magnitude smaller than for the host-based reductions

CiteSeerX

UNT Digital Library