Search CORE

62,879 research outputs found

Optimizing Collective Communication for Scalable Scientific Computing and Deep Learning

Author: Li Jiali
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2023
Field of study

In the realm of distributed computing, collective operations involve coordinated communication and synchronization among multiple processing units, enabling efficient data exchange and collaboration. Scientific applications, such as simulations, computational fluid dynamics, and scalable deep learning, require complex computations that can be parallelized across multiple nodes in a distributed system. These applications often involve data-dependent communication patterns, where collective operations are critical for achieving high performance in data exchange. Optimizing collective operations for scientific applications and deep learning involves improving the algorithms, communication patterns, and data distribution strategies to minimize communication overhead and maximize computational efficiency. Within the context of this dissertation, the specific focus is on optimizing the alltoall operation in 3D Fast Fourier Transform (FFT) applications and the allreduce operation in parallel deep learning, particularly on High-Performance Computing (HPC) systems. Advanced communication algorithms and methods are explored and implemented to improve communication efficiency, consequently enhancing the overall performance of 3D FFT applications. Furthermore, this dissertation investigates the identification of performance bottlenecks during collective communication over Horovod on distributed systems. These bottlenecks are addressed by proposing an optimized parallel communication pattern specifically tailored to alleviate the aforementioned limitations during the training phase in distributed deep learning. The objective is to achieve faster convergence and improve the overall training efficiency. Moreover, this dissertation proposes fault tolerance and elastic scaling features for distributed deep learning by leveraging the User-Level Failure Mitigation (ULFM) from Message Passing Interface (MPI). By incorporating ULFM MPI, the dissertation aims to enhance the elastic capabilities of distributed deep learning systems. This approach enables graceful and lightweight handling of failures while facilitating seamless scaling in dynamic computing environments

University of Tennessee, Knoxville: Trace

Distributed-Memory Breadth-First Search on Massive Graphs

Author: Asanovic Krste
Beamer Scott
Buluc Aydin
Madduri Kamesh
Patterson David
Publication venue
Publication date: 01/01/2017
Field of study

This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Collective Identities and Citizenship

Author: Nida-Rümelin Julian
Publication venue
Publication date: 01/01/1993
Field of study

Open Access LMU

Preparing HPC Applications for the Exascale Era: A Decoupling Strategy

Author: Gioiosa Roberto
Kestor Gokcen
Laure Erwin
Markidis Stefano
Peng Ivy Bo
Publication venue
Publication date: 03/08/2017
Field of study

Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propose a decoupling strategy to improve the scalability of applications running on large-scale systems. Our strategy separates application operations onto groups of processes and enables a dataflow processing paradigm among the groups. This mechanism is effective in reducing the impact of load imbalance and increases the parallel efficiency by pipelining multiple operations. We provide a proof-of-concept implementation using MPI, the de-facto programming system on current supercomputers. We demonstrate the effectiveness of this strategy by decoupling the reduce, particle communication, halo exchange and I/O operations in a set of scientific and data-analytics applications. A performance evaluation on 8,192 processes of a Cray XC40 supercomputer shows that the proposed approach can achieve up to 4x performance improvement.Comment: The 46th International Conference on Parallel Processing (ICPP-2017

arXiv.org e-Print Archive

Crossref

Nonorthogonal Quantum States Maximize Classical Information Capacity

Author: A. Peres
A. S. Kholevo
A. S. Kholevo
B. Schumacher
C. A. Fuchs
C. E. Shannon
C. H. Bennett
C. H. Bennett
Christopher A. Fuchs
E. B. Davies
K. Kraus
L. B. Levitin
P. Hausladen
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/1997
Field of study

I demonstrate that, rather unexpectedly, there exist noisy quantum channels for which the optimal classical information transmission rate is achieved only by signaling alphabets consisting of nonorthogonal quantum states.Comment: 5 pages, REVTeX, mild extension of results, much improved presentation, to appear in Physical Review Letter

arXiv.org e-Print Archive

Crossref

CERN Document Server

Effects of communication efficiency and exit capacity on fundamental diagrams for pedestrian motion in an obscure tunnel|a particle system approach

Author: Cirillo Emilio Nicola Maria
Colangeli Matteo
Muntean Adrian
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2015
Field of study

Fundamental diagrams describing the relation between pedestrians speed and density are key points in understanding pedestrian dynamics. Experimental data evidence the onset of complex behaviors in which the velocity decreases with the density and different logistic regimes are identified. This paper addresses the issue of pedestrians transport and of fundamental diagrams for a scenario involving the motion of pedestrians escaping from an obscure tunnel. % via a simple one--dimensional particle system model. We capture the effects of the communication efficiency and the exit capacity by means of two thresholds controlling the rate at which particles (walkers, pedestrians) move on the lattice. Using a particle system model, we show that in absence of limitation in communication among pedestrians we reproduce with good accuracy the standard fundamental diagrams, whose basic behaviors can be interpreted in terms of the exit capacity limitation. When the effect of a limited communication ability is considered, then interesting non--intuitive phenomena occur. Particularly, we shed light on the loss of monotonicity of the typical speed--density curves, revealing the existence of a pedestrians density optimizing the escape. We study both the discrete particle dynamics as well as the corresponding hydrodynamic limit (a porous medium equation and a transport (continuity) equation). We also point out the dependence of the effective transport coefficients on the two thresholds -- the essence of the microstructure information

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Archivio della ricerca- Università di Roma La Sapienza