Recent works have examined how deep neural networks, which can solve a
variety of difficult problems, incorporate the statistics of training data to
achieve their success. However, existing results have been established only in
limited settings. In this work, we derive the layerwise weight dynamics of
infinite-width neural networks with nonlinear activations trained by gradient
descent. We show theoretically that weight updates are aligned with input
correlations from intermediate layers weighted by error, and demonstrate
empirically that the result also holds in finite-width wide networks. The
alignment result allows us to formulate backpropagation-free learning rules,
named Align-zero and Align-ada, that theoretically achieve the same alignment
as backpropagation. Finally, we test these learning rules on benchmark problems
in feedforward and recurrent neural networks and demonstrate, in wide networks,
comparable performance to backpropagation.Comment: 22 pages, 11 figure