Search CORE

292 research outputs found

DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling

Author: Cai Shangming
Lyu Yongqiang
Vasilakos Athanasios V.
Wang Dongsheng
Wang Haixia
Xu Guangquan
Zheng Xi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/10/2021
Field of study

To reduce uploading bandwidth and address privacy concerns, deep learning at the network edge has been an emerging topic. Typically, edge devices collaboratively train a shared model using real-time generated data through the Parameter Server framework. Although all the edge devices can share the computing workloads, the distributed training processes over edge networks are still time-consuming due to the parameters and gradients transmission procedures between parameter servers and edge devices. Focusing on accelerating distributed Convolutional Neural Networks (CNNs) training at the network edge, we present DynaComm, a novel scheduler that dynamically decomposes each transmission procedure into several segments to achieve optimal layer-wise communications and computations overlapping during run-time. Through experiments, we verify that DynaComm manages to achieve optimal layer-wise scheduling for all cases compared to competing strategies while the model accuracy remains untouched.Comment: 16 pages, 12 figures. IEEE Journal on Selected Areas in Communication

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Towards GPU Utilization Prediction for Cloud Deep Learning

Author: Borowiec Damian
Friday Adrian
Garraghan Peter
Harper R.H.R.
Yeung Ging-Fung
Publication venue: USENIX Association
Publication date: 01/05/2020
Field of study

Understanding the GPU utilization of Deep Learning (DL) workloads is important for enhancing resource-efficiency and cost-benefit decision making for DL frameworks in the cloud. Current approaches to determine DL workload GPU utilization rely on online profiling within isolated GPU devices, and must be performed for every unique DL workload submission resulting in resource under-utilization and reduced service availability. In this paper, we propose a prediction engine to proactively determine the GPU utilization of heterogeneous DL workloads without the need for in-depth or isolated online profiling. We demonstrate that it is possible to predict DL workload GPU utilization via extracting information from its model computation graph. Our experiments show that the prediction engine achieves an RMSLE of 0.154, and can be exploited by DL schedulers to achieve up to 61.5% improvement to GPU cluster utilization

Lancaster E-Prints