945 research outputs found
Learning SO(3) Equivariant Representations with Spherical CNNs
We address the problem of 3D rotation equivariance in convolutional neural
networks. 3D rotations have been a challenging nuisance in 3D classification
tasks requiring higher capacity and extended data augmentation in order to
tackle it. We model 3D data with multi-valued spherical functions and we
propose a novel spherical convolutional network that implements exact
convolutions on the sphere by realizing them in the spherical harmonic domain.
Resulting filters have local symmetry and are localized by enforcing smooth
spectra. We apply a novel pooling on the spectral domain and our operations are
independent of the underlying spherical resolution throughout the network. We
show that networks with much lower capacity and without requiring data
augmentation can exhibit performance comparable to the state of the art in
standard retrieval and classification benchmarks.Comment: Camera-ready. Accepted to ECCV'18 as oral presentatio
Multi-view Convolutional Neural Networks for 3D Shape Recognition
A longstanding question in computer vision concerns the representation of 3D
shapes for recognition: should 3D shapes be represented with descriptors
operating on their native 3D formats, such as voxel grid or polygon mesh, or
can they be effectively represented with view-based descriptors? We address
this question in the context of learning to recognize 3D shapes from a
collection of their rendered views on 2D images. We first present a standard
CNN architecture trained to recognize the shapes' rendered views independently
of each other, and show that a 3D shape can be recognized even from a single
view at an accuracy far higher than using state-of-the-art 3D shape
descriptors. Recognition rates further increase when multiple views of the
shapes are provided. In addition, we present a novel CNN architecture that
combines information from multiple views of a 3D shape into a single and
compact shape descriptor offering even better recognition performance. The same
architecture can be applied to accurately recognize human hand-drawn sketches
of shapes. We conclude that a collection of 2D views can be highly informative
for 3D shape recognition and is amenable to emerging CNN architectures and
their derivatives.Comment: v1: Initial version. v2: An updated ModelNet40 training/test split is
used; results with low-rank Mahalanobis metric learning are added. v3 (ICCV
2015): A second camera setup without the upright orientation assumption is
added; some accuracy and mAP numbers are changed slightly because a small
issue in mesh rendering related to specularities is fixe
Learning Equivariant Representations
State-of-the-art deep learning systems often require large amounts of data
and computation. For this reason, leveraging known or unknown structure of the
data is paramount. Convolutional neural networks (CNNs) are successful examples
of this principle, their defining characteristic being the shift-equivariance.
By sliding a filter over the input, when the input shifts, the response shifts
by the same amount, exploiting the structure of natural images where semantic
content is independent of absolute pixel positions. This property is essential
to the success of CNNs in audio, image and video recognition tasks. In this
thesis, we extend equivariance to other kinds of transformations, such as
rotation and scaling. We propose equivariant models for different
transformations defined by groups of symmetries. The main contributions are (i)
polar transformer networks, achieving equivariance to the group of similarities
on the plane, (ii) equivariant multi-view networks, achieving equivariance to
the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving
equivariance to the continuous 3D rotation group, (iv) cross-domain image
embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v)
spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving
equivariance to 3D rotations for spherical vector fields. Applications include
image classification, 3D shape classification and retrieval, panoramic image
classification and segmentation, shape alignment and pose estimation. What
these models have in common is that they leverage symmetries in the data to
reduce sample and model complexity and improve generalization performance. The
advantages are more significant on (but not limited to) challenging tasks where
data is limited or input perturbations such as arbitrary rotations are present
- …