6 research outputs found
Estimating Small Differences in Car-Pose from Orbits
Distinction among nearby poses and among symmetries of an object is
challenging. In this paper, we propose a unified, group-theoretic approach to
tackle both. Different from existing works which directly predict absolute
pose, our method measures the pose of an object relative to another pose, i.e.,
the pose difference. The proposed method generates the complete orbit of an
object from a single view of the object with respect to the subgroup of SO(3)
of rotations around the z-axis, and compares the orbit of the object with
another orbit using a novel orbit metric to estimate the pose difference. The
generated orbit in the latent space records all the differences in pose in the
original observational space, and as a result, the method is capable of finding
subtle differences in pose. We demonstrate the effectiveness of the proposed
method on cars, where identifying the subtle pose differences is vital.Comment: to appear in BMVC201
EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes
Multimodal large-scale datasets for outdoor scenes are mostly designed for
urban driving problems. The scenes are highly structured and semantically
different from scenarios seen in nature-centered scenes such as gardens or
parks. To promote machine learning methods for nature-oriented applications,
such as agriculture and gardening, we propose the multimodal synthetic dataset
for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images
captured from more than 100 garden models. Each image is annotated with various
low/high-level vision modalities, including semantic segmentation, depth,
surface normals, intrinsic colors, and optical flow. Experimental results on
the state-of-the-art methods for semantic segmentation and monocular depth
prediction, two important tasks in computer vision, show positive impact of
pre-training deep networks on our dataset for unstructured natural scenes. The
dataset and related materials will be available at
https://lhoangan.github.io/eden.Comment: Accepted for publishing at WACV 202
Deep representations of structures in the 3D-world
This thesis demonstrates a collection of neural network tools that leverage the structures and symmetries of the 3D-world. We have explored various aspects of a vision system ranging from relative pose estimation to 3D-part decomposition from 2D images. For any vision system, it is crucially important to understand and to resolve visual ambiguities in 3D arising from imaging methods. This thesis has shown that leveraging prior knowledge about the structures and the symmetries of the 3D-world in neural network architectures brings about better representations for ambiguous situations. It helps solve problems which are inherently ill-posed