617 research outputs found
Equivariant Light Field Convolution and Transformer
3D reconstruction and novel view rendering can greatly benefit from geometric
priors when the input views are not sufficient in terms of coverage and
inter-view baselines. Deep learning of geometric priors from 2D images often
requires each image to be represented in a canonical frame and the prior
to be learned in a given or learned canonical frame. In this paper, given
only the relative poses of the cameras, we show how to learn priors from
multiple views equivariant to coordinate frame transformations by proposing an
-equivariant convolution and transformer in the space of rays in 3D.
This enables the creation of a light field that remains equivariant to the
choice of coordinate frame. The light field as defined in our work, refers both
to the radiance field and the feature field defined on the ray space. We model
the ray space, the domain of the light field, as a homogeneous space of
and introduce the -equivariant convolution in ray space. Depending on
the output domain of the convolution, we present convolution-based
-equivariant maps from ray space to ray space and to . Our
mathematical framework allows us to go beyond convolution to
-equivariant attention in the ray space. We demonstrate how to tailor
and adapt the equivariant convolution and transformer in the tasks of
equivariant neural rendering and reconstruction from multiple views. We
demonstrate -equivariance by obtaining robust results in roto-translated
datasets without performing transformation augmentation.Comment: 46 page
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
Spherical Transformer: Adapting Spherical Signal to CNNs
Convolutional neural networks (CNNs) have been widely used in various vision
tasks, e.g. image classification, semantic segmentation, etc. Unfortunately,
standard 2D CNNs are not well suited for spherical signals such as panorama
images or spherical projections, as the sphere is an unstructured grid. In this
paper, we present Spherical Transformer which can transform spherical signals
into vectors that can be directly processed by standard CNNs such that many
well-designed CNNs architectures can be reused across tasks and datasets by
pretraining. To this end, the proposed method first uses locally structured
sampling methods such as HEALPix to construct a transformer grid by using the
information of spherical points and its adjacent points, and then transforms
the spherical signals to the vectors through the grid. By building the
Spherical Transformer module, we can use multiple CNN architectures directly.
We evaluate our approach on the tasks of spherical MNIST recognition, 3D object
classification and omnidirectional image semantic segmentation. For 3D object
classification, we further propose a rendering-based projection method to
improve the performance and a rotational-equivariant model to improve the
anti-rotation ability. Experimental results on three tasks show that our
approach achieves superior performance over state-of-the-art methods
- …