34,698 research outputs found
Leveraging Image based Prior for Visual Place Recognition
In this study, we propose a novel scene descriptor for visual place
recognition. Unlike popular bag-of-words scene descriptors which rely on a
library of vector quantized visual features, our proposed descriptor is based
on a library of raw image data, such as publicly available photo collections
from Google StreetView and Flickr. The library images need not to be associated
with spatial information regarding the viewpoint and orientation of the scene.
As a result, these images are cheaper than the database images; in addition,
they are readily available. Our proposed descriptor directly mines the image
library to discover landmarks (i.e., image patches) that suitably match an
input query/database image. The discovered landmarks are then compactly
described by their pose and shape (i.e., library image ID, bounding boxes) and
used as a compact discriminative scene descriptor for the input image. We
evaluate the effectiveness of our scene description framework by comparing its
performance to that of previous approaches.Comment: 8 pages, 6 figures, preprint. Accepted for publication in MVA2015
(oral presentation
Improving Image Classification with Location Context
With the widespread availability of cellphones and cameras that have GPS
capabilities, it is common for images being uploaded to the Internet today to
have GPS coordinates associated with them. In addition to research that tries
to predict GPS coordinates from visual features, this also opens up the door to
problems that are conditioned on the availability of GPS coordinates. In this
work, we tackle the problem of performing image classification with location
context, in which we are given the GPS coordinates for images in both the train
and test phases. We explore different ways of encoding and extracting features
from the GPS coordinates, and show how to naturally incorporate these features
into a Convolutional Neural Network (CNN), the current state-of-the-art for
most image classification and recognition problems. We also show how it is
possible to simultaneously learn the optimal pooling radii for a subset of our
features within the CNN framework. To evaluate our model and to help promote
research in this area, we identify a set of location-sensitive concepts and
annotate a subset of the Yahoo Flickr Creative Commons 100M dataset that has
GPS coordinates with these concepts, which we make publicly available. By
leveraging location context, we are able to achieve almost a 7% gain in mean
average precision
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization
Many robotics applications require precise pose estimates despite operating
in large and changing environments. This can be addressed by visual
localization, using a pre-computed 3D model of the surroundings. The pose
estimation then amounts to finding correspondences between 2D keypoints in a
query image and 3D points in the model using local descriptors. However,
computational power is often limited on robotic platforms, making this task
challenging in large-scale environments. Binary feature descriptors
significantly speed up this 2D-3D matching, and have become popular in the
robotics community, but also strongly impair the robustness to perceptual
aliasing and changes in viewpoint, illumination and scene structure. In this
work, we propose to leverage recent advances in deep learning to perform an
efficient hierarchical localization. We first localize at the map level using
learned image-wide global descriptors, and subsequently estimate a precise pose
from 2D-3D matches computed in the candidate places only. This restricts the
local search and thus allows to efficiently exploit powerful non-binary
descriptors usually dismissed on resource-constrained devices. Our approach
results in state-of-the-art localization performance while running in real-time
on a popular mobile platform, enabling new prospects for robotics research.Comment: CoRL 2018 Camera-ready (fix typos and update citations
TossingBot: Learning to Throw Arbitrary Objects with Residual Physics
We investigate whether a robot arm can learn to pick and throw arbitrary
objects into selected boxes quickly and accurately. Throwing has the potential
to increase the physical reachability and picking speed of a robot arm.
However, precisely throwing arbitrary objects in unstructured settings presents
many challenges: from acquiring reliable pre-throw conditions (e.g. initial
pose of object in manipulator) to handling varying object-centric properties
(e.g. mass distribution, friction, shape) and dynamics (e.g. aerodynamics). In
this work, we propose an end-to-end formulation that jointly learns to infer
control parameters for grasping and throwing motion primitives from visual
observations (images of arbitrary objects in a bin) through trial and error.
Within this formulation, we investigate the synergies between grasping and
throwing (i.e., learning grasps that enable more accurate throws) and between
simulation and deep learning (i.e., using deep networks to predict residuals on
top of control parameters predicted by a physics simulator). The resulting
system, TossingBot, is able to grasp and throw arbitrary objects into boxes
located outside its maximum reach range at 500+ mean picks per hour (600+
grasps per hour with 85% throwing accuracy); and generalizes to new objects and
target locations. Videos are available at https://tossingbot.cs.princeton.eduComment: Summary Video: https://youtu.be/f5Zn2Up2RjQ Project webpage:
https://tossingbot.cs.princeton.ed
Metric Learning for Generalizing Spatial Relations to New Objects
Human-centered environments are rich with a wide variety of spatial relations
between everyday objects. For autonomous robots to operate effectively in such
environments, they should be able to reason about these relations and
generalize them to objects with different shapes and sizes. For example, having
learned to place a toy inside a basket, a robot should be able to generalize
this concept using a spoon and a cup. This requires a robot to have the
flexibility to learn arbitrary relations in a lifelong manner, making it
challenging for an expert to pre-program it with sufficient knowledge to do so
beforehand. In this paper, we address the problem of learning spatial relations
by introducing a novel method from the perspective of distance metric learning.
Our approach enables a robot to reason about the similarity between pairwise
spatial relations, thereby enabling it to use its previous knowledge when
presented with a new relation to imitate. We show how this makes it possible to
learn arbitrary spatial relations from non-expert users using a small number of
examples and in an interactive manner. Our extensive evaluation with real-world
data demonstrates the effectiveness of our method in reasoning about a
continuous spectrum of spatial relations and generalizing them to new objects.Comment: Accepted at the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems. The new Freiburg Spatial Relations Dataset and a demo
video of our approach running on the PR-2 robot are available at our project
website: http://spatialrelations.cs.uni-freiburg.d
Forecasting Human Dynamics from Static Images
This paper presents the first study on forecasting human dynamics from static
images. The problem is to input a single RGB image and generate a sequence of
upcoming human body poses in 3D. To address the problem, we propose the 3D Pose
Forecasting Network (3D-PFNet). Our 3D-PFNet integrates recent advances on
single-image human pose estimation and sequence prediction, and converts the 2D
predictions into 3D space. We train our 3D-PFNet using a three-step training
strategy to leverage a diverse source of training data, including image and
video based human pose datasets and 3D motion capture (MoCap) data. We
demonstrate competitive performance of our 3D-PFNet on 2D pose forecasting and
3D pose recovery through quantitative and qualitative results.Comment: Accepted in CVPR 201
- …