Search CORE

18,716 research outputs found

Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

Author: Horaud Radu
Lathuilière Stéphane
Massé Benoit
Mesejo Pablo
Publication venue: 'Elsevier BV'
Publication date: 23/04/2018
Field of study

This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.Comment: Paper submitted to Pattern Recognition Letter

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Author: Anderson Peter
Bruce Jake
Gould Stephen
Hengel Anton van den
Johnson Mark
Reid Ian
Sünderhauf Niko
Teney Damien
Wu Qi
Publication venue
Publication date: 01/01/2018
Field of study

A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Queensland University of Technology ePrints Archive

The Australian National University

Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction

Author: Ba Silèye
Horaud Radu
Massé Benoît
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/11/2017
Field of study

The visual focus of attention (VFOA) has been recognized as a prominent conversational cue. We are interested in estimating and tracking the VFOAs associated with multi-party social interactions. We note that in this type of situations the participants either look at each other or at an object of interest; therefore their eyes are not always visible. Consequently both gaze and VFOA estimation cannot be based on eye detection and tracking. We propose a method that exploits the correlation between eye gaze and head movements. Both VFOA and gaze are modeled as latent variables in a Bayesian switching state-space model. The proposed formulation leads to a tractable learning procedure and to an efficient algorithm that simultaneously tracks gaze and visual focus. The method is tested and benchmarked using two publicly available datasets that contain typical multi-party human-robot and human-human interactions.Comment: 15 pages, 8 figures, 6 table

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Neural Sensor Fusion for Spatial Visualization on a Mobile Robot

Author: Carpenter Gail A.
Gaudiano Paolo
Martens Siegfried
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/08/1998
Field of study

An ARTMAP neural network is used to integrate visual information and ultrasonic sensory information on a B 14 mobile robot. Training samples for the neural network are acquired without human intervention. Sensory snapshots are retrospectively associated with the distance to the wall, provided by on~ board odomctry as the robot travels in a straight line. The goal is to produce a more accurate measure of distance than is provided by the raw sensors. The neural network effectively combines sensory sources both within and between modalities. The improved distance percept is used to produce occupancy grid visualizations of the robot's environment. The maps produced point to specific problems of raw sensory information processing and demonstrate the benefits of using a neural network system for sensor fusion.Office of Naval Research and Naval Research Laboratory (00014-96-1-0772, 00014-95-1-0409, 00014-95-0657

Boston University Institutional Repository (OpenBU)

Introduction: The Third International Conference on Epigenetic Robotics

Author: Berthouze Luc
Prince Christopher G.
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2003
Field of study

This paper summarizes the paper and poster contributions to the Third International Workshop on Epigenetic Robotics. The focus of this workshop is on the cross-disciplinary interaction of developmental psychology and robotics. Namely, the general goal in this area is to create robotic models of the psychological development of various behaviors. The term "epigenetic" is used in much the same sense as the term "developmental" and while we could call our topic "developmental robotics", developmental robotics can be seen as having a broader interdisciplinary emphasis. Our focus in this workshop is on the interaction of developmental psychology and robotics and we use the phrase "epigenetic robotics" to capture this focus

CogPrints Cognitive Sciences Eprint Archive

Human Motion Trajectory Prediction: A Survey

Author: Arras Kai O.
Gavrila Dariu M.
Herman Michael
Kitani Kris M.
Palmieri Luigi
Rudenko Andrey
Publication venue: 'SAGE Publications'
Publication date: 17/12/2019
Field of study

With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research.Comment: Submitted to the International Journal of Robotics Research (IJRR), 37 page

arXiv.org e-Print Archive