19,979 research outputs found

    Deep Reinforcement Learning for Active Human Pose Estimation

    Full text link
    Most 3d human pose estimation methods assume that input -- be it images of a scene collected from one or several viewpoints, or from a video -- is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially -- in `time-freeze' mode -- and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines.Comment: Accepted to The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). Submission updated to include supplementary materia

    Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

    Get PDF
    This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of the number of people, of their positions and of their physical appearances. In particular, we use a recurrent neural network architecture in combination with Q-learning to find an optimal action-selection policy; we pre-train the network using a simulated environment that mimics realistic scenarios that involve speaking/silent participants, thus avoiding the need of tedious sessions of a robot interacting with people. Our experimental evaluation suggests that the proposed method is robust against parameter estimation, i.e. the parameter values yielded by the method do not have a decisive impact on the performance. The best results are obtained when both audio and visual information is jointly used. Experiments with the Nao robot indicate that our framework is a step forward towards the autonomous learning of socially acceptable gaze behavior.Comment: Paper submitted to Pattern Recognition Letter

    Vision-based deep execution monitoring

    Full text link
    Execution monitor of high-level robot actions can be effectively improved by visual monitoring the state of the world in terms of preconditions and postconditions that hold before and after the execution of an action. Furthermore a policy for searching where to look at, either for verifying the relations that specify the pre and postconditions or to refocus in case of a failure, can tremendously improve the robot execution in an uncharted environment. It is now possible to strongly rely on visual perception in order to make the assumption that the environment is observable, by the amazing results of deep learning. In this work we present visual execution monitoring for a robot executing tasks in an uncharted Lab environment. The execution monitor interacts with the environment via a visual stream that uses two DCNN for recognizing the objects the robot has to deal with and manipulate, and a non-parametric Bayes estimation to discover the relations out of the DCNN features. To recover from lack of focus and failures due to missed objects we resort to visual search policies via deep reinforcement learning
    • …
    corecore