1 research outputs found
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Existing approaches that partition a convolutional neural network (CNN) onto multiple
accelerators and pipeline the training computations avoid or limit the use of stale weights,
and they either underutilize accelerators or increase memory footprint. We explore the
impact of stale weights on the statistical efficiency and performance in a pipelined backpropagation scheme that maximizes accelerator utilization and keeps memory overhead
modest. We use LeNet-5, AlexNet, VGG and ResNet, and show that when pipelining
is limited to early layers in a network, training with stale weights converges and results
in models with comparable inference accuracies to those resulting from non-pipelined
training. However, when pipelining is deeper in the network, inference accuracies drop
signicantly. We propose a hybrid training scheme to address this drop. We demonstrate
the performance of our pipelined backpropagation on 2 GPUs achieving speedups of up
to 1.8X over a 1-GPU baseline, with a small inference accuracy drop.M.A.S