51,333 research outputs found
Vision and Learning for Deliberative Monocular Cluttered Flight
Cameras provide a rich source of information while being passive, cheap and
lightweight for small and medium Unmanned Aerial Vehicles (UAVs). In this work
we present the first implementation of receding horizon control, which is
widely used in ground vehicles, with monocular vision as the only sensing mode
for autonomous UAV flight in dense clutter. We make it feasible on UAVs via a
number of contributions: novel coupling of perception and control via relevant
and diverse, multiple interpretations of the scene around the robot, leveraging
recent advances in machine learning to showcase anytime budgeted cost-sensitive
feature selection, and fast non-linear regression for monocular depth
prediction. We empirically demonstrate the efficacy of our novel pipeline via
real world experiments of more than 2 kms through dense trees with a quadrotor
built from off-the-shelf parts. Moreover our pipeline is designed to combine
information from other modalities like stereo and lidar as well if available
Harvesting Multiple Views for Marker-less 3D Human Pose Annotations
Recent advances with Convolutional Networks (ConvNets) have shifted the
bottleneck for many computer vision tasks to annotated data collection. In this
paper, we present a geometry-driven approach to automatically collect
annotations for human pose prediction tasks. Starting from a generic ConvNet
for 2D human pose, and assuming a multi-view setup, we describe an automatic
way to collect accurate 3D human pose annotations. We capitalize on constraints
offered by the 3D geometry of the camera setup and the 3D structure of the
human body to probabilistically combine per view 2D ConvNet predictions into a
globally optimal 3D pose. This 3D pose is used as the basis for harvesting
annotations. The benefit of the annotations produced automatically with our
approach is demonstrated in two challenging settings: (i) fine-tuning a generic
ConvNet-based 2D pose predictor to capture the discriminative aspects of a
subject's appearance (i.e.,"personalization"), and (ii) training a ConvNet from
scratch for single view 3D human pose prediction without leveraging 3D pose
groundtruth. The proposed multi-view pose estimator achieves state-of-the-art
results on standard benchmarks, demonstrating the effectiveness of our method
in exploiting the available multi-view information.Comment: CVPR 2017 Camera Read
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
- …