19,398 research outputs found
When Regression Meets Manifold Learning for Object Recognition and Pose Estimation
In this work, we propose a method for object recognition and pose estimation
from depth images using convolutional neural networks. Previous methods
addressing this problem rely on manifold learning to learn low dimensional
viewpoint descriptors and employ them in a nearest neighbor search on an
estimated descriptor space. In comparison we create an efficient multi-task
learning framework combining manifold descriptor learning and pose regression.
By combining the strengths of manifold learning using triplet loss and pose
regression, we could either estimate the pose directly reducing the complexity
compared to NN search, or use learned descriptor for the NN descriptor
matching. By in depth experimental evaluation of the novel loss function we
observed that the view descriptors learned by the network are much more
discriminative resulting in almost 30% increase regarding relative pose
accuracy compared to related works. On the other hand, regarding directly
regressed poses we obtained important improvement compared to simple pose
regression. By leveraging the advantages of both manifold learning and
regression tasks, we are able to improve the current state-of-the-art for
object recognition and pose retrieval that we demonstrate through in depth
experimental evaluation
The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots
Deep networks have brought significant advances in robot perception, enabling
to improve the capabilities of robots in several visual tasks, ranging from
object detection and recognition to pose estimation, semantic scene
segmentation and many others. Still, most approaches typically address visual
tasks in isolation, resulting in overspecialized models which achieve strong
performances in specific applications but work poorly in other (often related)
tasks. This is clearly sub-optimal for a robot which is often required to
perform simultaneously multiple visual recognition tasks in order to properly
act and interact with the environment. This problem is exacerbated by the
limited computational and memory resources typically available onboard to a
robotic platform. The problem of learning flexible models which can handle
multiple tasks in a lightweight manner has recently gained attention in the
computer vision community and benchmarks supporting this research have been
proposed. In this work we study this problem in the robot vision context,
proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art
algorithms in this novel challenging scenario. We also define a new evaluation
protocol, better suited to the robot vision setting. Results shed light on the
strengths and weaknesses of existing approaches and on open issues, suggesting
directions for future research.Comment: This work has been submitted to IROS/RAL 201
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
We propose a Convolutional Neural Network (CNN)-based model "RotationNet,"
which takes multi-view images of an object as input and jointly estimates its
pose and object category. Unlike previous approaches that use known viewpoint
labels for training, our method treats the viewpoint labels as latent
variables, which are learned in an unsupervised manner during the training
using an unaligned object dataset. RotationNet is designed to use only a
partial set of multi-view images for inference, and this property makes it
useful in practical scenarios where only partial views are available. Moreover,
our pose alignment strategy enables one to obtain view-specific feature
representations shared across classes, which is important to maintain high
accuracy in both object categorization and pose estimation. Effectiveness of
RotationNet is demonstrated by its superior performance to the state-of-the-art
methods of 3D object classification on 10- and 40-class ModelNet datasets. We
also show that RotationNet, even trained without known poses, achieves the
state-of-the-art performance on an object pose estimation dataset. The code is
available on https://github.com/kanezaki/rotationnetComment: 24 pages, 23 figures. Accepted to CVPR 201
- …