Residual networks have shown great success and become indispensable in
today's deep models. In this work, we aim to re-investigate the training
process of residual networks from a novel social psychology perspective of
loafing, and further propose a new training strategy to strengthen the
performance of residual networks. As residual networks can be viewed as
ensembles of relatively shallow networks (i.e., \textit{unraveled view}) in
prior works, we also start from such view and consider that the final
performance of a residual network is co-determined by a group of sub-networks.
Inspired by the social loafing problem of social psychology, we find that
residual networks invariably suffer from similar problem, where sub-networks in
a residual network are prone to exert less effort when working as part of the
group compared to working alone. We define this previously overlooked problem
as \textit{network loafing}. As social loafing will ultimately cause the low
individual productivity and the reduced overall performance, network loafing
will also hinder the performance of a given residual network and its
sub-networks. Referring to the solutions of social psychology, we propose
\textit{stimulative training}, which randomly samples a residual sub-network
and calculates the KL-divergence loss between the sampled sub-network and the
given residual network, to act as extra supervision for sub-networks and make
the overall goal consistent. Comprehensive empirical results and theoretical
analyses verify that stimulative training can well handle the loafing problem,
and improve the performance of a residual network by improving the performance
of its sub-networks. The code is available at
https://github.com/Sunshine-Ye/NIPS22-ST .Comment: NIPS2022 accep