19,979 research outputs found
Deep Reinforcement Learning for Active Human Pose Estimation
Most 3d human pose estimation methods assume that input -- be it images of a
scene collected from one or several viewpoints, or from a video -- is given.
Consequently, they focus on estimates leveraging prior knowledge and
measurement by fusing information spatially and/or temporally, whenever
available. In this paper we address the problem of an active observer with
freedom to move and explore the scene spatially -- in `time-freeze' mode --
and/or temporally, by selecting informative viewpoints that improve its
estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable
deep reinforcement learning-based active pose estimation architecture which
learns to select appropriate views, in space and time, to feed an underlying
monocular pose estimator. We evaluate our model using single- and multi-target
estimators with strong result in both settings. Our system further learns
automatic stopping conditions in time and transition functions to the next
temporal processing step in videos. In extensive experiments with the Panoptic
multi-view setup, and for complex scenes containing multiple people, we show
that our model learns to select viewpoints that yield significantly more
accurate pose estimates compared to strong multi-view baselines.Comment: Accepted to The Thirty-Fourth AAAI Conference on Artificial
Intelligence (AAAI-20). Submission updated to include supplementary materia
Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction
This paper introduces a novel neural network-based reinforcement learning
approach for robot gaze control. Our approach enables a robot to learn and to
adapt its gaze control strategy for human-robot interaction neither with the
use of external sensors nor with human supervision. The robot learns to focus
its attention onto groups of people from its own audio-visual experiences,
independently of the number of people, of their positions and of their physical
appearances. In particular, we use a recurrent neural network architecture in
combination with Q-learning to find an optimal action-selection policy; we
pre-train the network using a simulated environment that mimics realistic
scenarios that involve speaking/silent participants, thus avoiding the need of
tedious sessions of a robot interacting with people. Our experimental
evaluation suggests that the proposed method is robust against parameter
estimation, i.e. the parameter values yielded by the method do not have a
decisive impact on the performance. The best results are obtained when both
audio and visual information is jointly used. Experiments with the Nao robot
indicate that our framework is a step forward towards the autonomous learning
of socially acceptable gaze behavior.Comment: Paper submitted to Pattern Recognition Letter
Vision-based deep execution monitoring
Execution monitor of high-level robot actions can be effectively improved by
visual monitoring the state of the world in terms of preconditions and
postconditions that hold before and after the execution of an action.
Furthermore a policy for searching where to look at, either for verifying the
relations that specify the pre and postconditions or to refocus in case of a
failure, can tremendously improve the robot execution in an uncharted
environment. It is now possible to strongly rely on visual perception in order
to make the assumption that the environment is observable, by the amazing
results of deep learning. In this work we present visual execution monitoring
for a robot executing tasks in an uncharted Lab environment. The execution
monitor interacts with the environment via a visual stream that uses two DCNN
for recognizing the objects the robot has to deal with and manipulate, and a
non-parametric Bayes estimation to discover the relations out of the DCNN
features. To recover from lack of focus and failures due to missed objects we
resort to visual search policies via deep reinforcement learning
- …