2 research outputs found
Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation
Due to large variations in shape, appearance, and viewing conditions, object
recognition is a key precursory challenge in the fields of object manipulation
and robotic/AI visual reasoning in general. Recognizing object categories,
particular instances of objects and viewpoints/poses of objects are three
critical subproblems robots must solve in order to accurately grasp/manipulate
objects and reason about their environments. Multi-view images of the same
object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g.
visual/depth descriptor spaces). These object manifolds share the same topology
despite being geometrically different. Each object manifold can be represented
as a deformed version of a unified manifold. The object manifolds can thus be
parameterized by its homeomorphic mapping/reconstruction from the unified
manifold. In this work, we develop a novel framework to jointly solve the three
challenging recognition sub-problems, by explicitly modeling the deformations
of object manifolds and factorizing it in a view-invariant space for
recognition. We perform extensive experiments on several challenging datasets
and achieve state-of-the-art results
Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance
This paper is focused on studying the view-manifold structure in the feature
spaces implied by the different layers of Convolutional Neural Networks (CNN).
There are several questions that this paper aims to answer: Does the learned
CNN representation achieve viewpoint invariance? How does it achieve viewpoint
invariance? Is it achieved by collapsing the view manifolds, or separating them
while preserving them? At which layer is view invariance achieved? How can the
structure of the view manifold at each layer of a deep convolutional neural
network be quantified experimentally? How does fine-tuning of a pre-trained CNN
on a multi-view dataset affect the representation at each layer of the network?
In order to answer these questions we propose a methodology to quantify the
deformation and degeneracy of view manifolds in CNN layers. We apply this
methodology and report interesting results in this paper that answer the
aforementioned questions.Comment: This paper accepted in ICLR 2016 main conferenc