13 research outputs found
Learning Linear Groups in Neural Networks
Employing equivariance in neural networks leads to greater parameter
efficiency and improved generalization performance through the encoding of
domain knowledge in the architecture; however, the majority of existing
approaches require an a priori specification of the desired symmetries. We
present a neural network architecture, Linear Group Networks (LGNs), for
learning linear groups acting on the weight space of neural networks. Linear
groups are desirable due to their inherent interpretability, as they can be
represented as finite matrices. LGNs learn groups without any supervision or
knowledge of the hidden symmetries in the data and the groups can be mapped to
well known operations in machine learning. We use LGNs to learn groups on
multiple datasets while considering different downstream tasks; we demonstrate
that the linear group structure depends on both the data distribution and the
considered task
Using and Abusing Equivariance
In this paper we show how Group Equivariant Convolutional Neural Networks use
subsampling to learn to break equivariance to their symmetries. We focus on 2D
rotations and reflections and investigate the impact of broken equivariance on
network performance. We show that a change in the input dimension of a network
as small as a single pixel can be enough for commonly used architectures to
become approximately equivariant, rather than exactly. We investigate the
impact of networks not being exactly equivariant and find that approximately
equivariant networks generalise significantly worse to unseen symmetries
compared to their exactly equivariant counterparts. However, when the
symmetries in the training data are not identical to the symmetries of the
network, we find that approximately equivariant networks are able to relax
their own equivariant constraints, causing them to match or outperform exactly
equivariant networks on common benchmark datasets
A Foliated View of Transfer Learning
Transfer learning considers a learning process where a new task is solved by
transferring relevant knowledge from known solutions to related tasks. While
this has been studied experimentally, there lacks a foundational description of
the transfer learning problem that exposes what related tasks are, and how they
can be exploited. In this work, we present a definition for relatedness between
tasks and identify foliations as a mathematical framework to represent such
relationships.Comment: 14 pages, 6 figure
Magnitude Invariant Parametrizations Improve Hypernetwork Learning
Hypernetworks, neural networks that predict the parameters of another neural
network, are powerful models that have been successfully used in diverse
applications from image generation to multi-task learning. Unfortunately,
existing hypernetworks are often challenging to train. Training typically
converges far more slowly than for non-hypernetwork models, and the rate of
convergence can be very sensitive to hyperparameter choices. In this work, we
identify a fundamental and previously unidentified problem that contributes to
the challenge of training hypernetworks: a magnitude proportionality between
the inputs and outputs of the hypernetwork. We demonstrate both analytically
and empirically that this can lead to unstable optimization, thereby slowing
down convergence, and sometimes even preventing any learning. We present a
simple solution to this problem using a revised hypernetwork formulation that
we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed
solution on several hypernetwork tasks, where it consistently stabilizes
training and achieves faster convergence. Furthermore, we perform a
comprehensive ablation study including choices of activation function,
normalization strategies, input dimensionality, and hypernetwork architecture;
and find that MIP improves training in all scenarios. We provide easy-to-use
code that can turn existing networks into MIP-based hypernetworks.Comment: Source code at https://github.com/JJGO/hyperligh