In this work, we explore the maximum-margin bias of quasi-homogeneous neural
networks trained with gradient flow on an exponential loss and past a point of
separability. We introduce the class of quasi-homogeneous models, which is
expressive enough to describe nearly all neural networks with homogeneous
activations, even those with biases, residual connections, and normalization
layers, while structured enough to enable geometric analysis of its gradient
dynamics. Using this analysis, we generalize the existing results of
maximum-margin bias for homogeneous networks to this richer class of models. We
find that gradient flow implicitly favors a subset of the parameters, unlike in
the case of a homogeneous model where all parameters are treated equally. We
demonstrate through simple examples how this strong favoritism toward
minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous
models. On the other hand, we conjecture that this norm-minimization discards,
when possible, unnecessary higher-order parameters, reducing the model to a
sparser parameterization. Lastly, by applying our theorem to sufficiently
expressive neural networks with normalization layers, we reveal a universal
mechanism behind the empirical phenomenon of Neural Collapse.Comment: 33 pages, 5 figure