Deep neural networks are typically trained using global error signals that
backpropagate (BP) end-to-end, which is not only biologically implausible but
also suffers from the update locking problem and requires huge memory
consumption. Local learning, which updates each layer independently with a
gradient-isolated auxiliary network, offers a promising alternative to address
the above problems. However, existing local learning methods are confronted
with a large accuracy gap with the BP counterpart, particularly for large-scale
networks. This is due to the weak coupling between local layers and their
subsequent network layers, as there is no gradient communication across layers.
To tackle this issue, we put forward an augmented local learning method, dubbed
AugLocal. AugLocal constructs each hidden layer's auxiliary network by
uniformly selecting a small subset of layers from its subsequent network layers
to enhance their synergy. We also propose to linearly reduce the depth of
auxiliary networks as the hidden layer goes deeper, ensuring sufficient network
capacity while reducing the computational cost of auxiliary networks. Our
extensive experiments on four image classification datasets (i.e., CIFAR-10,
SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up
to tens of local layers with a comparable accuracy to BP-trained networks while
reducing GPU memory usage by around 40%. The proposed AugLocal method,
therefore, opens up a myriad of opportunities for training high-performance
deep neural networks on resource-constrained platforms.Code is available at
https://github.com/ChenxiangMA/AugLocal.Comment: Accepted by ICLR 202