7,406 research outputs found
Unsupervised state representation learning with robotic priors: a robustness benchmark
Our understanding of the world depends highly on our capacity to produce
intuitive and simplified representations which can be easily used to solve
problems. We reproduce this simplification process using a neural network to
build a low dimensional state representation of the world from images acquired
by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way
using prior knowledge about the world as loss functions called robotic priors
and extend this approach to high dimension richer images to learn a 3D
representation of the hand position of a robot from RGB images. We propose a
quantitative evaluation of the learned representation using nearest neighbors
in the state space that allows to assess its quality and show both the
potential and limitations of robotic priors in realistic environments. We
augment image size, add distractors and domain randomization, all crucial
components to achieve transfer learning to real robots. Finally, we also
contribute a new prior to improve the robustness of the representation. The
applications of such low dimensional state representation range from easing
reinforcement learning (RL) and knowledge transfer across tasks, to
facilitating learning from raw data with more efficient and compact high level
representations. The results show that the robotic prior approach is able to
extract high level representation as the 3D position of an arm and organize it
into a compact and coherent space of states in a challenging dataset.Comment: ICRA 2018 submissio
DeepPermNet: Visual Permutation Learning
We present a principled approach to uncover the structure of visual data by
solving a novel deep learning task coined visual permutation learning. The goal
of this task is to find the permutation that recovers the structure of data
from shuffled versions of it. In the case of natural images, this task boils
down to recovering the original image from patches shuffled by an unknown
permutation matrix. Unfortunately, permutation matrices are discrete, thereby
posing difficulties for gradient-based methods. To this end, we resort to a
continuous approximation of these matrices using doubly-stochastic matrices
which we generate from standard CNN predictions using Sinkhorn iterations.
Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet,
an end-to-end CNN model for this task. The utility of DeepPermNet is
demonstrated on two challenging computer vision problems, namely, (i) relative
attributes learning and (ii) self-supervised representation learning. Our
results show state-of-the-art performance on the Public Figures and OSR
benchmarks for (i) and on the classification and segmentation tasks on the
PASCAL VOC dataset for (ii).Comment: Accepted in IEEE International Conference on Computer Vision and
Pattern Recognition CVPR 201
CRAVES: Controlling Robotic Arm with a Vision-based Economic System
Training a robotic arm to accomplish real-world tasks has been attracting
increasing attention in both academia and industry. This work discusses the
role of computer vision algorithms in this field. We focus on low-cost arms on
which no sensors are equipped and thus all decisions are made upon visual
recognition, e.g., real-time 3D pose estimation. This requires annotating a lot
of training data, which is not only time-consuming but also laborious.
In this paper, we present an alternative solution, which uses a 3D model to
create a large number of synthetic data, trains a vision model in this virtual
domain, and applies it to real-world images after domain adaptation. To this
end, we design a semi-supervised approach, which fully leverages the geometric
constraints among keypoints. We apply an iterative algorithm for optimization.
Without any annotations on real images, our algorithm generalizes well and
produces satisfying results on 3D pose estimation, which is evaluated on two
real-world datasets. We also construct a vision-based control system for task
accomplishment, for which we train a reinforcement learning agent in a virtual
environment and apply it to the real-world. Moreover, our approach, with merely
a 3D model being required, has the potential to generalize to other types of
multi-rigid-body dynamic systems.Comment: 10 pages, 6 figure
- …