Search CORE

18 research outputs found

MLTCP: Congestion Control for DNN Training

Author: Ghobadi Manya
Narang Sanjoli
Rajasekaran Sudarsanan
Zabreyko Anton A.
Publication venue
Publication date: 14/02/2024
Field of study

We present MLTCP, a technique to augment today's congestion control algorithms to accelerate DNN training jobs in shared GPU clusters. MLTCP enables the communication phases of jobs that compete for network bandwidth to interleave with each other, thereby utilizing the network efficiently. At the heart of MLTCP lies a very simple principle based on a key conceptual insight: DNN training flows should scale their congestion window size based on the number of bytes sent at each training iteration. We show that integrating this principle into today's congestion control protocols is straightforward: by adding 30-60 lines of code to Reno, CUBIC, or DCQCN, MLTCP stabilizes flows of different jobs into an interleaved state within a few training iterations, regardless of the number of competing flows or the start time of each flow. Our experiments with popular DNN training jobs demonstrate that enabling MLTCP accelerates the average and 99th percentile training iteration time by up to 2x and 4x, respectively

arXiv.org e-Print Archive

On a single server queue with negative arrivals and request repeated

Author: A. Gomez-Corral
J. R. Artalejo
Takács
Tijms
Zabreyko
Publication venue: 'Applied Probability Trust'
Publication date: 01/01/1999
Field of study

There is a growing interest in queueing systems with negative arrivals; i.e. where the arrival of a negative customer has the effect of deleting some customer in the queue. Recently, Hanison and Pitel (1996) investigated the queue length distribution of a single server queue of type M/G/1 with negative arrivals. In this paper we extend the analysis to the context of queueing systems with request repeated. We show that the Limiting distribution of the system state can still be reduced to a Fredholm integral equation. We solve such an equation numerically by introducing an auxiliary 'truncated' system which can easily be evaluated with the help of a regenerative approach

Docta Complutense

Crossref

On Level Crossing Analysis of Queues

Author: Brill
Brill
Cohen
Cohen
Kleinrock
Prabhu
Shanthikumar
Shanthikumar
Shanthikumar
Shanthjkumar
Zabreyko
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

A condition-based imperfect replacement policy for a periodically inspected system with two dependent wear indicators

Author: Abdel-Hameed
Asmussen
Bertoin
Buijs
Bäuerle
Castanier
Cocozza-Thivent
Ebrahimi
Kress
Mercier
Mercier
Singpurwalla
Van Noortwijk
Zabreyko
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Electromagnetic scattering problems solved by an improved spectral iteration technique

Author: Bitsadze
Costanzo
Costanzo
Dudley
Felsen
Jerri
Ko
Kress
Kreyszig
Meixner
Van Bladel
van den Berg
van den Berg
Zabreyko
Publication venue: 'Wiley'
Publication date: 01/01/2001
Field of study

Crossref

On the compactness of certain integral operators

Author: Atkinson
Dunford
Edwards
Hewitt
Ian H Sloan
Ivan G Graham
Kantorovich
Krasnosel'skii
Phillips
Radon
Rudin
Zabreyko
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Large deviations for the occupation time functional of a Poisson system of independent Brownian particles

Author: Billingsley
Borodin
Cox
Cox
Deuschel
Deuschel
Deuschel
Donsker
Ellis
Jean-Dominique Deuschel
Karlin
Kongming Wang
Lee
Lee
Lee
Revuz
Zabreyko
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref