We investigate the optimization of multilayer perceptrons on symmetric data.
We compare the strategy of constraining the architecture to be equivariant to
that of using augmentation. We show that, under natural assumptions on the loss
and non-linearities, the sets of equivariant stationary points are identical
for the two strategies, and that the set of equivariant layers is invariant
under the gradient flow for augmented models. Finally, we show that stationary
points may be unstable for augmented training although they are stable for the
equivariant models.Comment: v2: Revised manuscript. Mostly small edits, apart from new
experiments (see Appendix E