87 research outputs found
DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling
To reduce uploading bandwidth and address privacy concerns, deep learning at
the network edge has been an emerging topic. Typically, edge devices
collaboratively train a shared model using real-time generated data through the
Parameter Server framework. Although all the edge devices can share the
computing workloads, the distributed training processes over edge networks are
still time-consuming due to the parameters and gradients transmission
procedures between parameter servers and edge devices. Focusing on accelerating
distributed Convolutional Neural Networks (CNNs) training at the network edge,
we present DynaComm, a novel scheduler that dynamically decomposes each
transmission procedure into several segments to achieve optimal layer-wise
communications and computations overlapping during run-time. Through
experiments, we verify that DynaComm manages to achieve optimal layer-wise
scheduling for all cases compared to competing strategies while the model
accuracy remains untouched.Comment: 16 pages, 12 figures. IEEE Journal on Selected Areas in
Communication
- …