95 research outputs found
Learning Layer-wise Equivariances Automatically using Gradients
Convolutions encode equivariance symmetries into neural networks leading to
better generalisation performance. However, symmetries provide fixed hard
constraints on the functions a network can represent, need to be specified in
advance, and can not be adapted. Our goal is to allow flexible symmetry
constraints that can automatically be learned from data using gradients.
Learning symmetry and associated weight connectivity structures from scratch is
difficult for two reasons. First, it requires efficient and flexible
parameterisations of layer-wise equivariances. Secondly, symmetries act as
constraints and are therefore not encouraged by training losses measuring data
fit. To overcome these challenges, we improve parameterisations of soft
equivariance and learn the amount of equivariance in layers by optimising the
marginal likelihood, estimated using differentiable Laplace approximations. The
objective balances data fit and model complexity enabling layer-wise symmetry
discovery in deep networks. We demonstrate the ability to automatically learn
layer-wise equivariances on image classification tasks, achieving equivalent or
improved performance over baselines with hard-coded symmetry
Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models
Gaussian processes (GPs) are a powerful tool for probabilistic inference over
functions. They have been applied to both regression and non-linear
dimensionality reduction, and offer desirable properties such as uncertainty
estimates, robustness to over-fitting, and principled ways for tuning
hyper-parameters. However the scalability of these models to big datasets
remains an active topic of research. We introduce a novel re-parametrisation of
variational inference for sparse GP regression and latent variable models that
allows for an efficient distributed algorithm. This is done by exploiting the
decoupling of the data given the inducing points to re-formulate the evidence
lower bound in a Map-Reduce setting. We show that the inference scales well
with data and computational resources, while preserving a balanced distribution
of the load among the nodes. We further demonstrate the utility in scaling
Gaussian processes to big data. We show that GP performance improves with
increasing amounts of data in regression (on flight data with 2 million
records) and latent variable modelling (on MNIST). The results show that GPs
perform better than many common models often used for big data.Comment: 9 pages, 8 figure
- …