Understanding the internal representations learned by neural networks is a
cornerstone challenge in the science of machine learning. While there have been
significant recent strides in some cases towards understanding how neural
networks implement specific target functions, this paper explores a
complementary question -- why do networks arrive at particular computational
strategies? Our inquiry focuses on the algebraic learning tasks of modular
addition, sparse parities, and finite group operations. Our primary theoretical
findings analytically characterize the features learned by stylized neural
networks for these algebraic tasks. Notably, our main technique demonstrates
how the principle of margin maximization alone can be used to fully specify the
features learned by the network. Specifically, we prove that the trained
networks utilize Fourier features to perform modular addition and employ
features corresponding to irreducible group-theoretic representations to
perform compositions in general groups, aligning closely with the empirical
observations of Nanda et al. and Chughtai et al. More generally, we hope our
techniques can help to foster a deeper understanding of why neural networks
adopt specific computational strategies