664 research outputs found

    Learning Equivariant Representations

    Get PDF
    State-of-the-art deep learning systems often require large amounts of data and computation. For this reason, leveraging known or unknown structure of the data is paramount. Convolutional neural networks (CNNs) are successful examples of this principle, their defining characteristic being the shift-equivariance. By sliding a filter over the input, when the input shifts, the response shifts by the same amount, exploiting the structure of natural images where semantic content is independent of absolute pixel positions. This property is essential to the success of CNNs in audio, image and video recognition tasks. In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling. We propose equivariant models for different transformations defined by groups of symmetries. The main contributions are (i) polar transformer networks, achieving equivariance to the group of similarities on the plane, (ii) equivariant multi-view networks, achieving equivariance to the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving equivariance to the continuous 3D rotation group, (iv) cross-domain image embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v) spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving equivariance to 3D rotations for spherical vector fields. Applications include image classification, 3D shape classification and retrieval, panoramic image classification and segmentation, shape alignment and pose estimation. What these models have in common is that they leverage symmetries in the data to reduce sample and model complexity and improve generalization performance. The advantages are more significant on (but not limited to) challenging tasks where data is limited or input perturbations such as arbitrary rotations are present

    Self-supervised learning of a facial attribute embedding from video

    Full text link
    We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time. To perform this task, we introduce a network, Facial Attributes-Net (FAb-Net), that is trained to embed multiple frames from the same video face-track into a common low-dimensional space. With this approach, we make three contributions: first, we show that the network can leverage information from multiple source frames by predicting confidence/attention masks for each frame; second, we demonstrate that using a curriculum learning regime improves the learned embedding; finally, we demonstrate that the network learns a meaningful face embedding that encodes information about head pose, facial landmarks and facial expression, i.e. facial attributes, without having been supervised with any labelled data. We are comparable or superior to state-of-the-art self-supervised methods on these tasks and approach the performance of supervised methods.Comment: To appear in BMVC 2018. Supplementary material can be found at http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/fabnet.htm

    Affine Self Convolution

    Get PDF
    Attention mechanisms, and most prominently self-attention, are a powerful building block for processing not only text but also images. These provide a parameter efficient method for aggregating inputs. We focus on self-attention in vision models, and we combine it with convolution, which as far as we know, are the first to do. What emerges is a convolution with data dependent filters. We call this an Affine Self Convolution. While this is applied differently at each spatial location, we show that it is translation equivariant. We also modify the Squeeze and Excitation variant of attention, extending both variants of attention to the roto-translation group. We evaluate these new models on CIFAR10 and CIFAR100 and show an improvement in the number of parameters, while reaching comparable or higher accuracy at test time against self-trained baselines

    Isotopic tiling theory for hyperbolic surfaces

    Get PDF
    In this paper, we develop the mathematical tools needed to explore isotopy classes of tilings on hyperbolic surfaces of finite genus, possibly nonorientable, with boundary, and punctured. More specifically, we generalize results on Delaney-Dress combinatorial tiling theory using an extension of mapping class groups to orbifolds, in turn using this to study tilings of covering spaces of orbifolds. Moreover, we study finite subgroups of these mapping class groups. Our results can be used to extend the Delaney-Dress combinatorial encoding of a tiling to yield a finite symbol encoding the complexity of an isotopy class of tilings. The results of this paper provide the basis for a complete and unambiguous enumeration of isotopically distinct tilings of hyperbolic surfaces
    • …
    corecore