748 research outputs found
Representing images of a rotating object with cyclic permutation for view-based pose estimation
In this paper, we propose a novel approach using a cyclic group to model the appearance change in an image sequence of an object rotated about an arbitrary axis (1DOF out-of-plane rotation). In the sequence, an image xj is followed by an image xj+1. We represent the relationship between images by a cyclic group as xj+1 = Gxj , and obtain the matrix G by real block diagonalization. Then, G to the power of a real number is used to represent the image sequence and also for pose estimation. Two estimation methods are proposed and evaluated with real image sequences from the COIL-20, COIL-100, and ALOI datasets, and also compared to the Parametric Eigenspace method. Additionally, we discuss the relationship of the proposed approach to the pixel-wise Discrete Fourier Transform (DFT) and to linear regression, and also outline several extensions
Leveraging Symmetries in Pick and Place
Robotic pick and place tasks are symmetric under translations and rotations
of both the object to be picked and the desired place pose. For example, if the
pick object is rotated or translated, then the optimal pick action should also
rotate or translate. The same is true for the place pose; if the desired place
pose changes, then the place action should also transform accordingly. A
recently proposed pick and place framework known as Transporter Net captures
some of these symmetries, but not all. This paper analytically studies the
symmetries present in planar robotic pick and place and proposes a method of
incorporating equivariant neural models into Transporter Net in a way that
captures all symmetries. The new model, which we call Equivariant Transporter
Net, is equivariant to both pick and place symmetries and can immediately
generalize pick and place knowledge to different pick and place poses. We
evaluate the new model empirically and show that it is much more sample
efficient than the non-symmetric version, resulting in a system that can
imitate demonstrated pick and place behavior using very few human
demonstrations on a variety of imitation learning tasks.Comment: arXiv admin note: substantial text overlap with arXiv:2202.0940
CubeNet: Equivariance to 3D Rotation and Translation
3D Convolutional Neural Networks are sensitive to transformations applied to
their input. This is a problem because a voxelized version of a 3D object, and
its rotated clone, will look unrelated to each other after passing through to
the last layer of a network. Instead, an idealized model would preserve a
meaningful representation of the voxelized object, while explaining the
pose-difference between the two inputs. An equivariant representation vector
has two components: the invariant identity part, and a discernable encoding of
the transformation. Models that can't explain pose-differences risk "diluting"
the representation, in pursuit of optimizing a classification or regression
loss function.
We introduce a Group Convolutional Neural Network with linear equivariance to
translations and right angle rotations in three dimensions. We call this
network CubeNet, reflecting its cube-like symmetry. By construction, this
network helps preserve a 3D shape's global and local signature, as it is
transformed through successive layers. We apply this network to a variety of 3D
inference problems, achieving state-of-the-art on the ModelNet10 classification
challenge, and comparable performance on the ISBI 2012 Connectome Segmentation
Benchmark. To the best of our knowledge, this is the first 3D rotation
equivariant CNN for voxel representations.Comment: Preprin
Yet another representation of SO(3) by spherical functions for pose estimation
We propose a novel representation of SO(3) pose in 3 degrees-of-freedom (DOF)for view-based pose estimation. First we show that a conventional representation of pose in 1DOF is a Fourier basis, and extend the observation to 2 DOF with spherical harmonics. Thenwe represent 3 DOF pose with spherical functions that are continuous orthonormal basis onSO(3), and give transformations from the spherical functions representation to a quaternionand a rotation matrix
Learning Equivariant Representations
State-of-the-art deep learning systems often require large amounts of data
and computation. For this reason, leveraging known or unknown structure of the
data is paramount. Convolutional neural networks (CNNs) are successful examples
of this principle, their defining characteristic being the shift-equivariance.
By sliding a filter over the input, when the input shifts, the response shifts
by the same amount, exploiting the structure of natural images where semantic
content is independent of absolute pixel positions. This property is essential
to the success of CNNs in audio, image and video recognition tasks. In this
thesis, we extend equivariance to other kinds of transformations, such as
rotation and scaling. We propose equivariant models for different
transformations defined by groups of symmetries. The main contributions are (i)
polar transformer networks, achieving equivariance to the group of similarities
on the plane, (ii) equivariant multi-view networks, achieving equivariance to
the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving
equivariance to the continuous 3D rotation group, (iv) cross-domain image
embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v)
spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving
equivariance to 3D rotations for spherical vector fields. Applications include
image classification, 3D shape classification and retrieval, panoramic image
classification and segmentation, shape alignment and pose estimation. What
these models have in common is that they leverage symmetries in the data to
reduce sample and model complexity and improve generalization performance. The
advantages are more significant on (but not limited to) challenging tasks where
data is limited or input perturbations such as arbitrary rotations are present
Deep representations of structures in the 3D-world
This thesis demonstrates a collection of neural network tools that leverage the structures and symmetries of the 3D-world. We have explored various aspects of a vision system ranging from relative pose estimation to 3D-part decomposition from 2D images. For any vision system, it is crucially important to understand and to resolve visual ambiguities in 3D arising from imaging methods. This thesis has shown that leveraging prior knowledge about the structures and the symmetries of the 3D-world in neural network architectures brings about better representations for ambiguous situations. It helps solve problems which are inherently ill-posed
On discrete symmetries of robotics systems: A group-theoretic and data-driven analysis
We present a comprehensive study on discrete morphological symmetries of
dynamical systems, which are commonly observed in biological and artificial
locomoting systems, such as legged, swimming, and flying animals/robots/virtual
characters. These symmetries arise from the presence of one or more planes/axis
of symmetry in the system's morphology, resulting in harmonious duplication and
distribution of body parts. Significantly, we characterize how morphological
symmetries extend to symmetries in the system's dynamics, optimal control
policies, and in all proprioceptive and exteroceptive measurements related to
the system's dynamics evolution. In the context of data-driven methods,
symmetry represents an inductive bias that justifies the use of data
augmentation or symmetric function approximators. To tackle this, we present a
theoretical and practical framework for identifying the system's morphological
symmetry group \G and characterizing the symmetries in proprioceptive and
exteroceptive data measurements. We then exploit these symmetries using data
augmentation and \G-equivariant neural networks. Our experiments on both
synthetic and real-world applications provide empirical evidence of the
advantageous outcomes resulting from the exploitation of these symmetries,
including improved sample efficiency, enhanced generalization, and reduction of
trainable parameters.Comment: 8 pages, 4 figures, 7 optional appendix pages, 4 appendix figure
Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
The success of deep learning in computer vision is rooted in the ability of
deep networks to scale up model complexity as demanded by challenging visual
tasks. As complexity is increased, so is the need for large amounts of labeled
data to train the model. This is associated with a costly human annotation
effort. To address this concern, with the long-term goal of leveraging the
abundance of cheap unlabeled data, we explore methods of unsupervised
"pre-training." In particular, we propose to use self-supervised automatic
image colorization.
We show that traditional methods for unsupervised learning, such as
layer-wise clustering or autoencoders, remain inferior to supervised
pre-training. In search for an alternative, we develop a fully automatic image
colorization method. Our method sets a new state-of-the-art in revitalizing old
black-and-white photography, without requiring human effort or expertise.
Additionally, it gives us a method for self-supervised representation learning.
In order for the model to appropriately re-color a grayscale object, it must
first be able to identify it. This ability, learned entirely self-supervised,
can be used to improve other visual tasks, such as classification and semantic
segmentation. As a future direction for self-supervision, we investigate if
multiple proxy tasks can be combined to improve generalization. This turns out
to be a challenging open problem. We hope that our contributions to this
endeavor will provide a foundation for future efforts in making
self-supervision compete with supervised pre-training.Comment: Ph.D. thesi
Visual Perception of Garments for their Robotic Manipulation
Tématem předložené práce je strojové vnímání textilií založené na obrazové informaci a využité pro jejich robotickou manipulaci. Práce studuje několik reprezentativních textilií v běžných kognitivně-manipulačních úlohách, jako je například třídění neznámých oděvů podle typu nebo jejich skládání. Některé z těchto činností by v budoucnu mohly být vykonávány domácími robotickými pomocníky. Strojová manipulace s textiliemi je poptávaná také v průmyslu. Hlavní výzvou řešeného problému je měkkost a s tím související vysoká deformovatelnost textilií, které se tak mohou nacházet v bezpočtu vizuálně velmi odlišných stavů.The presented work addresses the visual perception of garments applied for their robotic manipulation. Various types of garments are considered in the typical perception and manipulation tasks, including their classification, folding or unfolding. Our work is motivated by the possibility of having humanoid household robots performing these tasks for us in the future, as well as by the industrial applications. The main challenge is the high deformability of garments, which can be posed in infinitely many configurations with a significantly varying appearance
- …