2,129 research outputs found
Using humanoid robots to study human behavior
Our understanding of human behavior advances as our humanoid robotics work progresses-and vice versa. This team's work focuses on trajectory formation and planning, learning from demonstration, oculomotor control and interactive behaviors. They are programming robotic behavior based on how we humans “program” behavior in-or train-each other
The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings
We motivate and describe a new freely available human-human dialogue dataset
for interactive learning of visually grounded word meanings through ostensive
definition by a tutor to a learner. The data has been collected using a novel,
character-by-character variant of the DiET chat tool (Healey et al., 2003;
Mills and Healey, submitted) with a novel task, where a Learner needs to learn
invented visual attribute words (such as " burchak " for square) from a tutor.
As such, the text-based interactions closely resemble face-to-face conversation
and thus contain many of the linguistic phenomena encountered in natural,
spontaneous dialogue. These include self-and other-correction, mid-sentence
continuations, interruptions, overlaps, fillers, and hedges. We also present a
generic n-gram framework for building user (i.e. tutor) simulations from this
type of incremental data, which is freely available to researchers. We show
that the simulations produce outputs that are similar to the original data
(e.g. 78% turn match similarity). Finally, we train and evaluate a
Reinforcement Learning dialogue control agent for learning visually grounded
word meanings, trained from the BURCHAK corpus. The learned policy shows
comparable performance to a rule-based system built previously.Comment: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
DEUX: Active Exploration for Learning Unsupervised Depth Perception
Depth perception models are typically trained on non-interactive datasets
with predefined camera trajectories. However, this often introduces systematic
biases into the learning process correlated to specific camera paths chosen
during data acquisition. In this paper, we investigate the role of how data is
collected for learning depth completion, from a robot navigation perspective,
by leveraging 3D interactive environments. First, we evaluate four depth
completion models trained on data collected using conventional navigation
techniques. Our key insight is that existing exploration paradigms do not
necessarily provide task-specific data points to achieve competent unsupervised
depth completion learning. We then find that data collected with respect to
photometric reconstruction has a direct positive influence on model
performance. As a result, we develop an active, task-informed, depth
uncertainty-based motion planning approach for learning depth completion, which
we call DEpth Uncertainty-guided eXploration (DEUX). Training with data
collected by our approach improves depth completion by an average greater than
18% across four depth completion models compared to existing exploration
methods on the MP3D test set. We show that our approach further improves
zero-shot generalization, while offering new insights into integrating robot
learning-based depth estimation
Efficient Intrinsically Motivated Robotic Grasping with Learning-Adaptive Imagination in Latent Space
Combining model-based and model-free deep reinforcement learning has shown
great promise for improving sample efficiency on complex control tasks while
still retaining high performance. Incorporating imagination is a recent effort
in this direction inspired by human mental simulation of motor behavior. We
propose a learning-adaptive imagination approach which, unlike previous
approaches, takes into account the reliability of the learned dynamics model
used for imagining the future. Our approach learns an ensemble of disjoint
local dynamics models in latent space and derives an intrinsic reward based on
learning progress, motivating the controller to take actions leading to data
that improves the models. The learned models are used to generate imagined
experiences, augmenting the training set of real experiences. We evaluate our
approach on learning vision-based robotic grasping and show that it
significantly improves sample efficiency and achieves near-optimal performance
in a sparse reward environment.Comment: In: Proceedings of the Joint IEEE International Conference on
Development and Learning and on Epigenetic Robotics (ICDL-EpiRob), Oslo,
Norway, Aug. 19-22, 201
Policy Search in Continuous Action Domains: an Overview
Continuous action policy search is currently the focus of intensive research,
driven both by the recent success of deep reinforcement learning algorithms and
the emergence of competitors based on evolutionary algorithms. In this paper,
we present a broad survey of policy search methods, providing a unified
perspective on very different approaches, including also Bayesian Optimization
and directed exploration methods. The main message of this overview is in the
relationship between the families of methods, but we also outline some factors
underlying sample efficiency properties of the various approaches.Comment: Accepted in the Neural Networks Journal (Volume 113, May 2019
Differentiable world programs
L'intelligence artificielle (IA) moderne a ouvert de nouvelles perspectives prometteuses pour la création de robots intelligents. En particulier, les architectures d'apprentissage basées sur le gradient (réseaux neuronaux profonds) ont considérablement amélioré la compréhension des scènes 3D en termes de perception, de raisonnement et d'action.
Cependant, ces progrès ont affaibli l'attrait de nombreuses techniques ``classiques'' développées au cours des dernières décennies.
Nous postulons qu'un mélange de méthodes ``classiques'' et ``apprises'' est la voie la plus prometteuse pour développer des modèles du monde flexibles, interprétables et exploitables : une nécessité pour les agents intelligents incorporés.
La question centrale de cette thèse est : ``Quelle est la manière idéale de combiner les techniques classiques avec des architectures d'apprentissage basées sur le gradient pour une compréhension riche du monde 3D ?''. Cette vision ouvre la voie à une multitude d'applications qui ont un impact fondamental sur la façon dont les agents physiques perçoivent et interagissent avec leur environnement. Cette thèse, appelée ``programmes différentiables pour modèler l'environnement'', unifie les efforts de plusieurs domaines étroitement liés mais actuellement disjoints, notamment la robotique, la vision par ordinateur, l'infographie et l'IA.
Ma première contribution---gradSLAM--- est un système de localisation et de cartographie simultanées (SLAM) dense et entièrement différentiable. En permettant le calcul du gradient à travers des composants autrement non différentiables tels que l'optimisation non linéaire par moindres carrés, le raycasting, l'odométrie visuelle et la cartographie dense, gradSLAM ouvre de nouvelles voies pour intégrer la reconstruction 3D classique et l'apprentissage profond.
Ma deuxième contribution - taskography - propose une sparsification conditionnée par la tâche de grandes scènes 3D encodées sous forme de graphes de scènes 3D. Cela permet aux planificateurs classiques d'égaler (et de surpasser) les planificateurs de pointe basés sur l'apprentissage en concentrant le calcul sur les attributs de la scène pertinents pour la tâche.
Ma troisième et dernière contribution---gradSim--- est un simulateur entièrement différentiable qui combine des moteurs physiques et graphiques différentiables pour permettre l'estimation des paramètres physiques et le contrôle visuomoteur, uniquement à partir de vidéos ou d'une image fixe.Modern artificial intelligence (AI) has created exciting new opportunities for building intelligent robots. In particular, gradient-based learning architectures (deep neural networks) have tremendously improved 3D scene understanding in terms of perception, reasoning, and action.
However, these advancements have undermined many ``classical'' techniques developed over the last few decades.
We postulate that a blend of ``classical'' and ``learned'' methods is the most promising path to developing flexible, interpretable, and actionable models of the world: a necessity for intelligent embodied agents.
``What is the ideal way to combine classical techniques with gradient-based learning architectures for a rich understanding of the 3D world?'' is the central question in this dissertation. This understanding enables a multitude of applications that fundamentally impact how embodied agents perceive and interact with their environment. This dissertation, dubbed ``differentiable world programs'', unifies efforts from multiple closely-related but currently-disjoint fields including robotics, computer vision, computer graphics, and AI.
Our first contribution---gradSLAM---is a fully differentiable dense simultaneous localization and mapping (SLAM) system. By enabling gradient computation through otherwise non-differentiable components such as nonlinear least squares optimization, ray casting, visual odometry, and dense mapping, gradSLAM opens up new avenues for integrating classical 3D reconstruction and deep learning.
Our second contribution---taskography---proposes a task-conditioned sparsification of large 3D scenes encoded as 3D scene graphs. This enables classical planners to match (and surpass) state-of-the-art learning-based planners by focusing computation on task-relevant scene attributes.
Our third and final contribution---gradSim---is a fully differentiable simulator that composes differentiable physics and graphics engines to enable physical parameter estimation and visuomotor control, solely from videos or a still image
- …