3,509 research outputs found
Recognizing Objects In-the-wild: Where Do We Stand?
The ability to recognize objects is an essential skill for a robotic system
acting in human-populated environments. Despite decades of effort from the
robotic and vision research communities, robots are still missing good visual
perceptual systems, preventing the use of autonomous agents for real-world
applications. The progress is slowed down by the lack of a testbed able to
accurately represent the world perceived by the robot in-the-wild. In order to
fill this gap, we introduce a large-scale, multi-view object dataset collected
with an RGB-D camera mounted on a mobile robot. The dataset embeds the
challenges faced by a robot in a real-life application and provides a useful
tool for validating object recognition algorithms. Besides describing the
characteristics of the dataset, the paper evaluates the performance of a
collection of well-established deep convolutional networks on the new dataset
and analyzes the transferability of deep representations from Web images to
robotic data. Despite the promising results obtained with such representations,
the experiments demonstrate that object classification with real-life robotic
data is far from being solved. Finally, we provide a comparative study to
analyze and highlight the open challenges in robot vision, explaining the
discrepancies in the performance
Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work
Deep networks thrive when trained on large scale data collections. This has
given ImageNet a central role in the development of deep architectures for
visual object classification. However, ImageNet was created during a specific
period in time, and as such it is prone to aging, as well as dataset bias
issues. Moving beyond fixed training datasets will lead to more robust visual
systems, especially when deployed on robots in new environments which must
train on the objects they encounter there. To make this possible, it is
important to break free from the need for manual annotators. Recent work has
begun to investigate how to use the massive amount of images available on the
Web in place of manual image annotations. We contribute to this research thread
with two findings: (1) a study correlating a given level of noisily labels to
the expected drop in accuracy, for two deep architectures, on two different
types of noise, that clearly identifies GoogLeNet as a suitable architecture
for learning from Web data; (2) a recipe for the creation of Web datasets with
minimal noise and maximum visual variability, based on a visual and natural
language processing concept expansion strategy. By combining these two results,
we obtain a method for learning powerful deep object models automatically from
the Web. We confirm the effectiveness of our approach through object
categorization experiments using our Web-derived version of ImageNet on a
popular robot vision benchmark database, and on a lifelong object discovery
task on a mobile robot.Comment: 8 pages, 7 figures, 3 table
- …