17,140 research outputs found
Object Recognition in Noisy RGB-D Data
The object recognition task on 3D scenes is a growing research field that faces some problems relative to the use of 3D point clouds. In this work, we focus on dealing with noisy clouds through the use of the Growing Neural Gas (GNG) network filtering algorithm. Another challenge is the selection of the right keypoints detection method, that allows to identify a model into a scene cloud. The GNG method is able to represent the input data with a desired resolution while preserving the topology of the input space. Experiments show how the introduction of the GNG method yields better recognitions results than others filtering algorithms when noise is present.The object recognition task on 3D scenes is a growing research field that faces some problems relative to the use of 3D point clouds. In this work, we focus on dealing with noisy clouds through the use of the Growing Neural Gas (GNG) network filtering algorithm. Another challenge is the selection of the right keypoints detection method, that allows to identify a model into a scene cloud. The GNG method is able to represent the input data with a desired resolution while preserving the topology of the input space. Experiments show how the introduction of the GNG method yields better recognitions results than others filtering algorithms when noise is present
Multimodal Deep Learning for Robust RGB-D Object Recognition
Robust object recognition is a crucial ingredient of many, if not all,
real-world robotics applications. This paper leverages recent progress on
Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture
for object recognition. Our architecture is composed of two separate CNN
processing streams - one for each modality - which are consecutively combined
with a late fusion network. We focus on learning with imperfect sensor data, a
typical problem in real-world robotics tasks. For accurate learning, we
introduce a multi-stage training methodology and two crucial ingredients for
handling depth data with CNNs. The first, an effective encoding of depth
information for CNNs that enables learning without the need for large depth
datasets. The second, a data augmentation scheme for robust learning with depth
images by corrupting them with realistic noise patterns. We present
state-of-the-art results on the RGB-D object dataset and show recognition in
challenging RGB-D real-world noisy settings.Comment: Final version submitted to IROS'2015, results unchanged,
reformulation of some text passages in abstract and introductio
Object recognition in noisy RGB-D data using GNG
Object recognition in 3D scenes is a research field in which there is intense activity guided by the problems related to the use of 3D point clouds. Some of these problems are influenced by the presence of noise in the cloud that reduces the effectiveness of a recognition process. This work proposes a method for dealing with the noise present in point clouds by applying the growing neural gas (GNG) network filtering algorithm. This method is able to represent the input data with the desired number of neurons while preserving the topology of the input space. The GNG obtained results which were compared with a Voxel grid filter to determine the efficacy of our approach. Moreover, since a stage of the recognition process includes the detection of keypoints in a cloud, we evaluated different keypoint detectors to determine which one produces the best results in the selected pipeline. Experiments show how the GNG method yields better recognition results than other filtering algorithms when noise is present
Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition
Webly-supervised learning has recently emerged as an alternative paradigm to
traditional supervised learning based on large-scale datasets with manual
annotations. The key idea is that models such as CNNs can be learned from the
noisy visual data available on the web. In this work we aim to exploit web data
for video understanding tasks such as action recognition and detection. One of
the main problems in webly-supervised learning is cleaning the noisy labeled
data from the web. The state-of-the-art paradigm relies on training a first
classifier on noisy data that is then used to clean the remaining dataset. Our
key insight is that this procedure biases the second classifier towards samples
that the first one understands. Here we train two independent CNNs, a RGB
network on web images and video frames and a second network using temporal
information from optical flow. We show that training the networks independently
is vastly superior to selecting the frames for the flow classifier by using our
RGB network. Moreover, we show benefits in enriching the training set with
different data sources from heterogeneous public web databases. We demonstrate
that our framework outperforms all other webly-supervised methods on two public
benchmarks, UCF-101 and Thumos'14.Comment: Submitted to CVIU SI: Computer Vision and the We
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Zero-Shot Deep Domain Adaptation
Domain adaptation is an important tool to transfer knowledge about a task
(e.g. classification) learned in a source domain to a second, or target domain.
Current approaches assume that task-relevant target-domain data is available
during training. We demonstrate how to perform domain adaptation when no such
task-relevant target-domain data is available. To tackle this issue, we propose
zero-shot deep domain adaptation (ZDDA), which uses privileged information from
task-irrelevant dual-domain pairs. ZDDA learns a source-domain representation
which is not only tailored for the task of interest but also close to the
target-domain representation. Therefore, the source-domain task of interest
solution (e.g. a classifier for classification tasks) which is jointly trained
with the source-domain representation can be applicable to both the source and
target representations. Using the MNIST, Fashion-MNIST, NIST, EMNIST, and SUN
RGB-D datasets, we show that ZDDA can perform domain adaptation in
classification tasks without access to task-relevant target-domain training
data. We also extend ZDDA to perform sensor fusion in the SUN RGB-D scene
classification task by simulating task-relevant target-domain representations
with task-relevant source-domain data. To the best of our knowledge, ZDDA is
the first domain adaptation and sensor fusion method which requires no
task-relevant target-domain data. The underlying principle is not particular to
computer vision data, but should be extensible to other domains.Comment: This paper is accepted to the European Conference on Computer Vision
(ECCV), 201
Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work
Deep networks thrive when trained on large scale data collections. This has
given ImageNet a central role in the development of deep architectures for
visual object classification. However, ImageNet was created during a specific
period in time, and as such it is prone to aging, as well as dataset bias
issues. Moving beyond fixed training datasets will lead to more robust visual
systems, especially when deployed on robots in new environments which must
train on the objects they encounter there. To make this possible, it is
important to break free from the need for manual annotators. Recent work has
begun to investigate how to use the massive amount of images available on the
Web in place of manual image annotations. We contribute to this research thread
with two findings: (1) a study correlating a given level of noisily labels to
the expected drop in accuracy, for two deep architectures, on two different
types of noise, that clearly identifies GoogLeNet as a suitable architecture
for learning from Web data; (2) a recipe for the creation of Web datasets with
minimal noise and maximum visual variability, based on a visual and natural
language processing concept expansion strategy. By combining these two results,
we obtain a method for learning powerful deep object models automatically from
the Web. We confirm the effectiveness of our approach through object
categorization experiments using our Web-derived version of ImageNet on a
popular robot vision benchmark database, and on a lifelong object discovery
task on a mobile robot.Comment: 8 pages, 7 figures, 3 table
- …