Search CORE

95 research outputs found

CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

Author: Cooper A. Feder
De Sa Christopher
Guo Wentao
Lu Yucheng
Pham Khiem
Ruan Charlie F.
Yuan Tiancheng
Publication venue
Publication date: 29/05/2023
Field of study

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: While it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms baselines empirically, including distributed RR, on a variety of benchmark tasks

arXiv.org e-Print Archive

OCCL: a Deadlock-free Library for GPU Collective Communication

Author: Li Pengze
Liu Juncheng
Pan Lichen
Xiao Zhen
Yuan Jinhui
Zhang Rongkai
Publication venue
Publication date: 11/03/2023
Field of study

Various distributed deep neural network (DNN) training technologies lead to increasingly complicated use of collective communications on GPU. The deadlock-prone collectives on GPU force researchers to guarantee that collectives are enqueued in a consistent order on each GPU to prevent deadlocks. In complex distributed DNN training scenarios, manual hardcoding is the only practical way for deadlock prevention, which poses significant challenges to the development of artificial intelligence. This paper presents OCCL, which is, to the best of our knowledge, the first deadlock-free collective communication library for GPU supporting dynamic decentralized preemption and gang-scheduling for collectives. Leveraging the preemption opportunity of collectives on GPU, OCCL dynamically preempts collectives in a decentralized way via the deadlock-free collective execution framework and allows dynamic decentralized gang-scheduling via the stickiness adjustment scheme. With the help of OCCL, researchers no longer have to struggle to get all GPUs to launch collectives in a consistent order to prevent deadlocks. We implement OCCL with several optimizations and integrate OCCL with a distributed deep learning framework OneFlow. Experimental results demonstrate that OCCL achieves comparable or better latency and bandwidth for collectives compared to NCCL, the state-of-the-art. When used in distributed DNN training, OCCL can improve the peak training throughput by up to 78% compared to statically sequenced NCCL, while introducing overheads of less than 6.5% across various distributed DNN training approaches

arXiv.org e-Print Archive

Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Author: Alistarh Dan
Ben-Nun Tal
Di Girolamo Salvatore
Dryden Nikoli
Hoefler Torsten
Li Shigang
Nadiradze Giorgi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iterations to achieve the same accuracy as their globally-communicating counterparts. We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global communication via subgroup weight exchange. The key insight is a combination of algorithmic changes to the averaging scheme and the use of a group allreduce operation. We prove the convergence of WAGMA-SGD, and empirically show that it retains convergence rates similar to Allreduce-SGD. For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale. Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput (e.g., 2.1x on 1,024 GPUs for reinforcement learning), and achieves the fastest time-to-solution (e.g., the highest score using the shortest training time for Transformer).Comment: Published in IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), vol. 32, no. 7, pp. 1725-1739, 1 July 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

IST Austria: PubRep (Institute of Science and Technology)

Spartan Daily, September 23, 1986

Author: San Jose State University School of Journalism and Mass Communications
Publication venue: SJSU ScholarWorks
Publication date: 23/09/1986
Field of study

Volume 87, Issue 18https://scholarworks.sjsu.edu/spartandaily/7475/thumbnail.jp

SJSU ScholarWorks

Ongoing data reduction, theoretical studies

Author: Greenstadt F. W.
Scarf F. L.
Publication venue
Publication date
Field of study

A nonspecific review of theory, correlative date analysis and supporting research and technology is presented. Title pages in some of the following areas are included: (1) magnetosphere boundary observations; (2) venus ionosphere and solar wind interaction; (3) ISEE-C plasma wave investigation, and (4) solar system plasmas

NASA Technical Reports Server

The joint US/UK 1990 epoch world magnetic model

Author: Coleman Rachel J.
Lauber Stephen E.
Peck Michael R.
Quinn John M.
Publication venue
Publication date
Field of study

A detailed summary of the data used, analyses performed, modeling techniques employed, and results obtained in the course of the 1990 Epoch World Magnetic Modeling effort are given. Also, use and limitations of the GEOMAG algorithm are presented. Charts and tables related to the 1990 World Magnetic Model (WMM-90) for the Earth's main field and secular variation in Mercator and polar stereographic projections are presented along with useful tables of several magnetic field components and their secular variation on a 5-degree worldwide grid

NASA Technical Reports Server

TorchRL: A data-driven decision-making library for PyTorch

Author: Bettini Matteo
Bou Albert
De Fabritiis Gianni
Dittert Sebastian
Kumar Vikash
Moens Vincent
Sodhani Shagun
Yang Xiaomeng
Publication venue
Publication date: 01/06/2023
Field of study

Striking a balance between integration and modularity is crucial for a machine learning library to be versatile and user-friendly, especially in handling decision and control tasks that involve large development teams and complex, real-world data, and environments. To address this issue, we propose TorchRL, a generalistic control library for PyTorch that provides well-integrated, yet standalone components. With a versatile and robust primitive design, TorchRL facilitates streamlined algorithm development across the many branches of Reinforcement Learning (RL) and control. We introduce a new PyTorch primitive, TensorDict, as a flexible data carrier that empowers the integration of the library's components while preserving their modularity. Hence replay buffers, datasets, distributed data collectors, environments, transforms and objectives can be effortlessly used in isolation or combined. We provide a detailed description of the building blocks, supporting code examples and an extensive overview of the library across domains and tasks. Finally, we show comparative benchmarks to demonstrate its computational efficiency. TorchRL fosters long-term support and is publicly available on GitHub for greater reproducibility and collaboration within the research community. The code is opensourced on https://github.com/pytorch/rl

arXiv.org e-Print Archive

The Use of Teams Games Tournament (TGT) to Develop Students’ Reading Skill at the First Grade of SMAN 4 Bone

Author: Lestari Nurfaidah
Publication venue
Publication date: 28/11/2017
Field of study

The instrument of this research was reading test. In reading test, the result of the data indicated that there was a significant difference between students’ posttest in both experimental and controlled class. In experimental class, the total mean score of post-test was 72.02 was greater than the total mean score in controlled class was 61.62. From the t-test, the researcher found that the value the t-test in the post test was greater than the t-table (5.94> 2.000)

Repositori UIN Alauddin Makassar