It is usually infeasible to fit and train an entire large deep neural network
(DNN) model using a single edge device due to the limited resources. To
facilitate intelligent applications across edge devices, researchers have
proposed partitioning a large model into several sub-models, and deploying each
of them to a different edge device to collaboratively train a DNN model.
However, the communication overhead caused by the large amount of data
transmitted from one device to another during training, as well as the
sub-optimal partition point due to the inaccurate latency prediction of
computation at each edge device can significantly slow down training. In this
paper, we propose AccEPT, an acceleration scheme for accelerating the edge
collaborative pipeline-parallel training. In particular, we propose a
light-weight adaptive latency predictor to accurately estimate the computation
latency of each layer at different devices, which also adapts to unseen devices
through continuous learning. Therefore, the proposed latency predictor leads to
better model partitioning which balances the computation loads across
participating devices. Moreover, we propose a bit-level computation-efficient
data compression scheme to compress the data to be transmitted between devices
during training. Our numerical results demonstrate that our proposed
acceleration approach is able to significantly speed up edge pipeline parallel
training up to 3 times faster in the considered experimental settings