22,786 research outputs found
K-VIL: Keypoints-based Visual Imitation Learning
Visual imitation learning provides efficient and intuitive solutions for
robotic systems to acquire novel manipulation skills. However, simultaneously
learning geometric task constraints and control policies from visual inputs
alone remains a challenging problem. In this paper, we propose an approach for
keypoint-based visual imitation (K-VIL) that automatically extracts sparse,
object-centric, and embodiment-independent task representations from a small
number of human demonstration videos. The task representation is composed of
keypoint-based geometric constraints on principal manifolds, their associated
local frames, and the movement primitives that are then needed for the task
execution. Our approach is capable of extracting such task representations from
a single demonstration video, and of incrementally updating them when new
demonstrations become available. To reproduce manipulation skills using the
learned set of prioritized geometric constraints in novel scenes, we introduce
a novel keypoint-based admittance controller. We evaluate our approach in
several real-world applications, showcasing its ability to deal with cluttered
scenes, new instances of categorical objects, and large object pose and shape
variations, as well as its efficiency and robustness in both one-shot and
few-shot imitation learning settings. Videos and source code are available at
https://sites.google.com/view/k-vil
Photometric Depth Super-Resolution
This study explores the use of photometric techniques (shape-from-shading and
uncalibrated photometric stereo) for upsampling the low-resolution depth map
from an RGB-D sensor to the higher resolution of the companion RGB image. A
single-shot variational approach is first put forward, which is effective as
long as the target's reflectance is piecewise-constant. It is then shown that
this dependency upon a specific reflectance model can be relaxed by focusing on
a specific class of objects (e.g., faces), and delegate reflectance estimation
to a deep neural network. A multi-shot strategy based on randomly varying
lighting conditions is eventually discussed. It requires no training or prior
on the reflectance, yet this comes at the price of a dedicated acquisition
setup. Both quantitative and qualitative evaluations illustrate the
effectiveness of the proposed methods on synthetic and real-world scenarios.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(T-PAMI), 2019. First three authors contribute equall
Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Feature selection is essential for effective visual recognition. We propose
an efficient joint classifier learning and feature selection method that
discovers sparse, compact representations of input features from a vast sea of
candidates, with an almost unsupervised formulation. Our method requires only
the following knowledge, which we call the \emph{feature sign}---whether or not
a particular feature has on average stronger values over positive samples than
over negatives. We show how this can be estimated using as few as a single
labeled training sample per class. Then, using these feature signs, we extend
an initial supervised learning problem into an (almost) unsupervised clustering
formulation that can incorporate new data without requiring ground truth
labels. Our method works both as a feature selection mechanism and as a fully
competitive classifier. It has important properties, low computational cost and
excellent accuracy, especially in difficult cases of very limited training
data. We experiment on large-scale recognition in video and show superior speed
and performance to established feature selection approaches such as AdaBoost,
Lasso, greedy forward-backward selection, and powerful classifiers such as SVM.Comment: arXiv admin note: text overlap with arXiv:1411.771
- …