40,839 research outputs found
Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter
Camera viewpoint selection is an important aspect of visual grasp detection,
especially in clutter where many occlusions are present. Where other approaches
use a static camera position or fixed data collection routines, our Multi-View
Picking (MVP) controller uses an active perception approach to choose
informative viewpoints based directly on a distribution of grasp pose estimates
in real time, reducing uncertainty in the grasp poses caused by clutter and
occlusions. In trials of grasping 20 objects from clutter, our MVP controller
achieves 80% grasp success, outperforming a single-viewpoint grasp detector by
12%. We also show that our approach is both more accurate and more efficient
than approaches which consider multiple fixed viewpoints.Comment: ICRA 2019 Video: https://youtu.be/Vn3vSPKlaEk Code:
https://github.com/dougsm/mvp_gras
Active SLAM for autonomous underwater exploration
Exploration of a complex underwater environment without an a priori map is beyond the state of the art for autonomous underwater vehicles (AUVs). Despite several efforts regarding simultaneous localization and mapping (SLAM) and view planning, there is no exploration framework, tailored to underwater vehicles, that faces exploration combining mapping, active localization, and view planning in a unified way. We propose an exploration framework, based on an active SLAM strategy, that combines three main elements: a view planner, an iterative closest point algorithm (ICP)-based pose-graph SLAM algorithm, and an action selection mechanism that makes use of the joint map and state entropy reduction. To demonstrate the benefits of the active SLAM strategy, several tests were conducted with the Girona 500 AUV, both in simulation and in the real world. The article shows how the proposed framework makes it possible to plan exploratory trajectories that keep the vehicle’s uncertainty bounded; thus, creating more consistent maps.Peer ReviewedPostprint (published version
Perception-aware Path Planning
In this paper, we give a double twist to the problem of planning under
uncertainty. State-of-the-art planners seek to minimize the localization
uncertainty by only considering the geometric structure of the scene. In this
paper, we argue that motion planning for vision-controlled robots should be
perception aware in that the robot should also favor texture-rich areas to
minimize the localization uncertainty during a goal-reaching task. Thus, we
describe how to optimally incorporate the photometric information (i.e.,
texture) of the scene, in addition to the the geometric one, to compute the
uncertainty of vision-based localization during path planning. To avoid the
caveats of feature-based localization systems (i.e., dependence on feature type
and user-defined thresholds), we use dense, direct methods. This allows us to
compute the localization uncertainty directly from the intensity values of
every pixel in the image. We also describe how to compute trajectories online,
considering also scenarios with no prior knowledge about the map. The proposed
framework is general and can easily be adapted to different robotic platforms
and scenarios. The effectiveness of our approach is demonstrated with extensive
experiments in both simulated and real-world environments using a
vision-controlled micro aerial vehicle.Comment: 16 pages, 20 figures, revised version. Conditionally accepted for
IEEE Transactions on Robotic
Time-Contrastive Networks: Self-Supervised Learning from Video
We propose a self-supervised approach for learning representations and
robotic behaviors entirely from unlabeled videos recorded from multiple
viewpoints, and study how this representation can be used in two robotic
imitation settings: imitating object interactions from videos of humans, and
imitating human poses. Imitation of human behavior requires a
viewpoint-invariant representation that captures the relationships between
end-effectors (hands or robot grippers) and the environment, object attributes,
and body pose. We train our representations using a metric learning loss, where
multiple simultaneous viewpoints of the same observation are attracted in the
embedding space, while being repelled from temporal neighbors which are often
visually similar but functionally different. In other words, the model
simultaneously learns to recognize what is common between different-looking
images, and what is different between similar-looking images. This signal
causes our model to discover attributes that do not change across viewpoint,
but do change across time, while ignoring nuisance variables such as
occlusions, motion blur, lighting and background. We demonstrate that this
representation can be used by a robot to directly mimic human poses without an
explicit correspondence, and that it can be used as a reward function within a
reinforcement learning algorithm. While representations are learned from an
unlabeled collection of task-related videos, robot behaviors such as pouring
are learned by watching a single 3rd-person demonstration by a human. Reward
functions obtained by following the human demonstrations under the learned
representation enable efficient reinforcement learning that is practical for
real-world robotic systems. Video results, open-source code and dataset are
available at https://sermanet.github.io/imitat
Marker based Thermal-Inertial Localization for Aerial Robots in Obscurant Filled Environments
For robotic inspection tasks in known environments fiducial markers provide a
reliable and low-cost solution for robot localization. However, detection of
such markers relies on the quality of RGB camera data, which degrades
significantly in the presence of visual obscurants such as fog and smoke. The
ability to navigate known environments in the presence of obscurants can be
critical for inspection tasks especially, in the aftermath of a disaster.
Addressing such a scenario, this work proposes a method for the design of
fiducial markers to be used with thermal cameras for the pose estimation of
aerial robots. Our low cost markers are designed to work in the long wave
infrared spectrum, which is not affected by the presence of obscurants, and can
be affixed to any object that has measurable temperature difference with
respect to its surroundings. Furthermore, the estimated pose from the fiducial
markers is fused with inertial measurements in an extended Kalman filter to
remove high frequency noise and error present in the fiducial pose estimates.
The proposed markers and the pose estimation method are experimentally
evaluated in an obscurant filled environment using an aerial robot carrying a
thermal camera.Comment: 10 pages, 5 figures, Published in International Symposium on Visual
Computing 201
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
A robot that can carry out a natural-language instruction has been a dream
since before the Jetsons cartoon series imagined a life of leisure mediated by
a fleet of attentive robot helpers. It is a dream that remains stubbornly
distant. However, recent advances in vision and language methods have made
incredible progress in closely related areas. This is significant because a
robot interpreting a natural-language navigation instruction on the basis of
what it sees is carrying out a vision and language process that is similar to
Visual Question Answering. Both tasks can be interpreted as visually grounded
sequence-to-sequence translation problems, and many of the same methods are
applicable. To enable and encourage the application of vision and language
methods to the problem of interpreting visually-grounded navigation
instructions, we present the Matterport3D Simulator -- a large-scale
reinforcement learning environment based on real imagery. Using this simulator,
which can in future support a range of embodied vision and language tasks, we
provide the first benchmark dataset for visually-grounded natural language
navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio
- …