664 research outputs found
Learning Equivariant Representations
State-of-the-art deep learning systems often require large amounts of data
and computation. For this reason, leveraging known or unknown structure of the
data is paramount. Convolutional neural networks (CNNs) are successful examples
of this principle, their defining characteristic being the shift-equivariance.
By sliding a filter over the input, when the input shifts, the response shifts
by the same amount, exploiting the structure of natural images where semantic
content is independent of absolute pixel positions. This property is essential
to the success of CNNs in audio, image and video recognition tasks. In this
thesis, we extend equivariance to other kinds of transformations, such as
rotation and scaling. We propose equivariant models for different
transformations defined by groups of symmetries. The main contributions are (i)
polar transformer networks, achieving equivariance to the group of similarities
on the plane, (ii) equivariant multi-view networks, achieving equivariance to
the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving
equivariance to the continuous 3D rotation group, (iv) cross-domain image
embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v)
spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving
equivariance to 3D rotations for spherical vector fields. Applications include
image classification, 3D shape classification and retrieval, panoramic image
classification and segmentation, shape alignment and pose estimation. What
these models have in common is that they leverage symmetries in the data to
reduce sample and model complexity and improve generalization performance. The
advantages are more significant on (but not limited to) challenging tasks where
data is limited or input perturbations such as arbitrary rotations are present
Self-supervised learning of a facial attribute embedding from video
We propose a self-supervised framework for learning facial attributes by
simply watching videos of a human face speaking, laughing, and moving over
time. To perform this task, we introduce a network, Facial Attributes-Net
(FAb-Net), that is trained to embed multiple frames from the same video
face-track into a common low-dimensional space. With this approach, we make
three contributions: first, we show that the network can leverage information
from multiple source frames by predicting confidence/attention masks for each
frame; second, we demonstrate that using a curriculum learning regime improves
the learned embedding; finally, we demonstrate that the network learns a
meaningful face embedding that encodes information about head pose, facial
landmarks and facial expression, i.e. facial attributes, without having been
supervised with any labelled data. We are comparable or superior to
state-of-the-art self-supervised methods on these tasks and approach the
performance of supervised methods.Comment: To appear in BMVC 2018. Supplementary material can be found at
http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/fabnet.htm
Affine Self Convolution
Attention mechanisms, and most prominently self-attention, are a powerful
building block for processing not only text but also images. These provide a
parameter efficient method for aggregating inputs. We focus on self-attention
in vision models, and we combine it with convolution, which as far as we know,
are the first to do. What emerges is a convolution with data dependent filters.
We call this an Affine Self Convolution. While this is applied differently at
each spatial location, we show that it is translation equivariant. We also
modify the Squeeze and Excitation variant of attention, extending both variants
of attention to the roto-translation group. We evaluate these new models on
CIFAR10 and CIFAR100 and show an improvement in the number of parameters, while
reaching comparable or higher accuracy at test time against self-trained
baselines
Isotopic tiling theory for hyperbolic surfaces
In this paper, we develop the mathematical tools needed to explore isotopy
classes of tilings on hyperbolic surfaces of finite genus, possibly
nonorientable, with boundary, and punctured. More specifically, we generalize
results on Delaney-Dress combinatorial tiling theory using an extension of
mapping class groups to orbifolds, in turn using this to study tilings of
covering spaces of orbifolds. Moreover, we study finite subgroups of these
mapping class groups. Our results can be used to extend the Delaney-Dress
combinatorial encoding of a tiling to yield a finite symbol encoding the
complexity of an isotopy class of tilings. The results of this paper provide
the basis for a complete and unambiguous enumeration of isotopically distinct
tilings of hyperbolic surfaces
- …