1,311 research outputs found
When Regression Meets Manifold Learning for Object Recognition and Pose Estimation
In this work, we propose a method for object recognition and pose estimation
from depth images using convolutional neural networks. Previous methods
addressing this problem rely on manifold learning to learn low dimensional
viewpoint descriptors and employ them in a nearest neighbor search on an
estimated descriptor space. In comparison we create an efficient multi-task
learning framework combining manifold descriptor learning and pose regression.
By combining the strengths of manifold learning using triplet loss and pose
regression, we could either estimate the pose directly reducing the complexity
compared to NN search, or use learned descriptor for the NN descriptor
matching. By in depth experimental evaluation of the novel loss function we
observed that the view descriptors learned by the network are much more
discriminative resulting in almost 30% increase regarding relative pose
accuracy compared to related works. On the other hand, regarding directly
regressed poses we obtained important improvement compared to simple pose
regression. By leveraging the advantages of both manifold learning and
regression tasks, we are able to improve the current state-of-the-art for
object recognition and pose retrieval that we demonstrate through in depth
experimental evaluation
The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots
Deep networks have brought significant advances in robot perception, enabling
to improve the capabilities of robots in several visual tasks, ranging from
object detection and recognition to pose estimation, semantic scene
segmentation and many others. Still, most approaches typically address visual
tasks in isolation, resulting in overspecialized models which achieve strong
performances in specific applications but work poorly in other (often related)
tasks. This is clearly sub-optimal for a robot which is often required to
perform simultaneously multiple visual recognition tasks in order to properly
act and interact with the environment. This problem is exacerbated by the
limited computational and memory resources typically available onboard to a
robotic platform. The problem of learning flexible models which can handle
multiple tasks in a lightweight manner has recently gained attention in the
computer vision community and benchmarks supporting this research have been
proposed. In this work we study this problem in the robot vision context,
proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art
algorithms in this novel challenging scenario. We also define a new evaluation
protocol, better suited to the robot vision setting. Results shed light on the
strengths and weaknesses of existing approaches and on open issues, suggesting
directions for future research.Comment: This work has been submitted to IROS/RAL 201
Identifying First-person Camera Wearers in Third-person Videos
We consider scenarios in which we wish to perform joint scene understanding,
object tracking, activity recognition, and other tasks in environments in which
multiple people are wearing body-worn cameras while a third-person static
camera also captures the scene. To do this, we need to establish person-level
correspondences across first- and third-person videos, which is challenging
because the camera wearer is not visible from his/her own egocentric video,
preventing the use of direct feature matching. In this paper, we propose a new
semi-Siamese Convolutional Neural Network architecture to address this novel
challenge. We formulate the problem as learning a joint embedding space for
first- and third-person videos that considers both spatial- and motion-domain
cues. A new triplet loss function is designed to minimize the distance between
correct first- and third-person matches while maximizing the distance between
incorrect ones. This end-to-end approach performs significantly better than
several baselines, in part by learning the first- and third-person features
optimized for matching jointly with the distance measure itself
- …