93,531 research outputs found
Fast Adaptation of Neural Networks
The ability to learn quickly from a few samples is a vital element of intelligence. Humans can reuse past knowledge and learn incredibly quickly. Also humans are able to interact with others to effectively guide their learning process. Computer vision systems for recognizing objects automatically from pixels are becoming commonplace in production systems. These modern computer vision systems use deep neural networks to automatically learn and recognize objects from data. Oftentimes, these deep neural networks used in production require a lot of data, take a long time to learn and forget old things when learning something new.
We build upon previous methods called Prototypical Networks and Model-Agnostic Meta-Learning (MAML) that enables machines to learn to recognize new objects with very little supervision from the user. We extend these methods to the semi-supervised few-shot learning scenario, where the few labeled samples are accompanied with (potentially many) unlabeled samples. Our proposed methods are able to learn better by also making use of the additional unlabeled samples. We note that in many real-world applications the adaptation performance can be significantly improved by requesting the few labels through user feedback (active adaptation). Further, our proposed methods can also adapt to new tasks without any labeled examples (unsupervised adaptation) when the new task has the same output space as the training tasks do
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
Multi-task Self-Supervised Visual Learning
We investigate methods for combining multiple self-supervised tasks--i.e.,
supervised tasks where data can be collected without manual labeling--in order
to train a single visual representation. First, we provide an apples-to-apples
comparison of four different self-supervised tasks using the very deep
ResNet-101 architecture. We then combine tasks to jointly train a network. We
also explore lasso regularization to encourage the network to factorize the
information in its representation, and methods for "harmonizing" network inputs
in order to learn a more unified representation. We evaluate all methods on
ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our
results show that deeper networks work better, and that combining tasks--even
via a naive multi-head architecture--always improves performance. Our best
joint network nearly matches the PASCAL performance of a model pre-trained on
ImageNet classification, and matches the ImageNet network on NYU depth
prediction.Comment: Published at ICCV 201
Learning to Reconstruct Shapes from Unseen Classes
From a single image, humans are able to perceive the full 3D shape of an
object by exploiting learned shape priors from everyday life. Contemporary
single-image 3D reconstruction algorithms aim to solve this task in a similar
fashion, but often end up with priors that are highly biased by training
classes. Here we present an algorithm, Generalizable Reconstruction (GenRe),
designed to capture more generic, class-agnostic shape priors. We achieve this
with an inference network and training procedure that combine 2.5D
representations of visible surfaces (depth and silhouette), spherical shape
representations of both visible and non-visible surfaces, and 3D voxel-based
representations, in a principled manner that exploits the causal structure of
how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe
performs well on single-view shape reconstruction, and generalizes to diverse
novel objects from categories not seen during training.Comment: NeurIPS 2018 (Oral). The first two authors contributed equally to
this paper. Project page: http://genre.csail.mit.edu
- …