51 research outputs found

    Knowledge- and ambiguity-aware robot learning from corrective and evaluative feedback

    No full text
    In order to deploy robots that could be adapted by non-expert users, interactive imitation learning (IIL) methods must be flexible regarding the interaction preferences of the teacher and avoid assumptions of perfect teachers (oracles), while considering they make mistakes influenced by diverse human factors. In this work, we propose an IIL method that improves the human–robot interaction for non-expert and imperfect teachers in two directions. First, uncertainty estimation is included to endow the agents with a lack of knowledge awareness (epistemic uncertainty) and demonstration ambiguity awareness (aleatoric uncertainty), such that the robot can request human input when it is deemed more necessary. Second, the proposed method enables the teachers to train with the flexibility of using corrective demonstrations, evaluative reinforcements, and implicit positive feedback. The experimental results show an improvement in learning convergence with respect to other learning methods when the agent learns from highly ambiguous teachers. Additionally, in a user study, it was found that the components of the proposed method improve the teaching experience and the data efficiency of the learning process.Learning & Autonomous Contro

    Simultaneous learning of objective function and policy from interactive teaching with corrective feedback

    No full text
    Some imitation learning approaches rely on Inverse Reinforcement Learning (IRL) methods, to decode and generalize implicit goals given by expert demonstrations. The study of IRL normally has the assumption of available expert demonstrations, which is not always possible. There are Machine Learning methods that allow non-expert teachers to guide robots to learn complex policies, which eventually fills the expert dependencies of IRL. This work introduces an approach for simultaneously teaching robot policies and objective functions from vague human corrective feedback. The main goal is to generalize the insights that a non-expert human teacher provides to the robot, to unseen conditions, without further need for human effort in the complementary training process. We present an experimental validation of the introduced approach for transfer learning of knowledge to scenarios not considered while the non-expert was teaching. Experimental results show that the learned reward functions obtain similar performance in RL processes compared to engineered reward functions used as baseline, both in simulated and real environments.Accepted Author ManuscriptLearning & Autonomous Contro

    Uncertainties based queries for Interactive policy learning with evaluations and corrections

    No full text
    Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Learning & Autonomous Contro

    Simultaneous learning of objective function and policy from interactive teaching with corrective feedback

    No full text
    Some imitation learning approaches rely on Inverse Reinforcement Learning (IRL) methods, to decode and generalize implicit goals given by expert demonstrations. The study of IRL normally has the assumption of available expert demonstrations, which is not always possible. There are Machine Learning methods that allow non-expert teachers to guide robots to learn complex policies, which eventually fills the expert dependencies of IRL. This work introduces an approach for simultaneously teaching robot policies and objective functions from vague human corrective feedback. The main goal is to generalize the insights that a non-expert human teacher provides to the robot, to unseen conditions, without further need for human effort in the complementary training process. We present an experimental validation of the introduced approach for transfer learning of knowledge to scenarios not considered while the non-expert was teaching. Experimental results show that the learned reward functions obtain similar performance in RL processes compared to engineered reward functions used as baseline, both in simulated and real environments.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Learning & Autonomous Contro

    Learning state representation for deep actor-critic control

    No full text
    Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.Accepted Author ManuscriptOLD Intelligent Control & Robotic

    Learning Task-Parameterized Skills From Few Demonstrations

    No full text
    Moving away from repetitive tasks, robots nowadays demand versatile skills that adapt to different situations. Task-parameterized learning improves the generalization of motion policies by encoding relevant contextual information in the task parameters, hence enabling flexible task executions. However, training such a policy often requires collecting multiple demonstrations in different situations. To comprehensively create different situations is non-trivial thus renders the method less applicable to real-world problems. Therefore, training with fewer demonstrations/situations is desirable. This paper presents a novel concept to augment the original training dataset with synthetic data for policy improvements, thus allows learning task-parameterized skills with few demonstrations.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Learning & Autonomous Contro

    Benchmarking Behavior Prediction Models in Gap Acceptance Scenarios

    No full text
    Autonomous vehicles currently suffer from a time-inefficient driving style caused by uncertainty about human behavior in traffic interactions. Accurate and reliable prediction models enabling more efficient trajectory planning could make autonomous vehicles more assertive in such interactions. However, the evaluation of such models is commonly oversimplistic, ignoring the asymmetric importance of prediction errors and the heterogeneity of the datasets used for testing. We examine the potential of recasting interactions between vehicles as gap acceptance scenarios and evaluating models in this structured environment. To that end, we develop a framework aiming to facilitate the evaluation of any model, by any metric, and in any scenario. We then apply this framework to state-of-the-art prediction models, which all show themselves to be unreliable in the most safety-critical situations.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Human-Robot InteractionLearning & Autonomous Contro

    Head-tracked off-axis perspective projection improves gaze readability of 3D virtual avatars

    No full text
    Virtual avatars have been employed in many contexts, from simple conversational agents to communicating the internal state and intentions of large robots when interacting with humans. Rarely, however, are they employed in scenarios which require non-verbal communication of spatial information or dynamic interaction from a variety of perspectives. When presented on a flat screen, many illusions and visual artifacts interfere with such applications, which leads to a strong preference for physically-actuated heads and faces.By adjusting the perspective projection used to render 3D avatars to match a viewer's physical perspective, they could provide a useful middle ground between typical 2D/3D avatar representations, which are often ambiguous in their spatial relationships, and physically-actuated heads/faces, which can be difficult to construct or impractical to use in some environments. A user study was conducted to determine to what extent a head-tracked perspective projection scheme was able to mitigate the issues in readability of a 3D avatar's expression or gaze target compared to use of a standard perspective projection. To the authors' knowledge, this is the first user study to perform such a comparison, and the results show not only an overall improvement in viewers' accuracy when attempting to follow the avatar's gaze, but a reduction in spatial biases in predictions made from oblique viewing anglesGreen Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Learning & Autonomous Contro

    Interactive learning of sensor policy fusion

    No full text
    Teaching a robot how to navigate in a new environment only from the sensor input in an end-to-end fashion is still an open challenge with much attention from industry and academia. This paper proposes an algorithm with the name 'Learning Interactively to Resolve Ambiguity' (LIRA) that tackles the problem of sensor policy fusion extending state- of-the-art methods by employing ambiguity awareness in the decision-making and solving it using active and interactive querying of the human expert. LIRA, in fact, employs Gaussian Processes for the estimation of the policy's confidence and investigates the ambiguity due to the disagreement between the single sensor policies on the desired action to take. LIRA aims to make the teaching of new policies easier, learning from human demonstrations and correction.The experiments show that LIRA can be used for learning a sensor-fused policy from scratch or also leveraging the knowledge of existing single sensor policies. The experiments focus on the estimation of the human interventions required for teaching a successful navigation policy.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Learning & Autonomous Contro

    Learning Interactively to Resolve Ambiguity in Reference Frame Selection

    No full text
    In Learning from Demonstrations, ambiguities can lead to bad generalization of the learned policy. This paper proposes a framework called Learning Interactively to Resolve Ambiguity (LIRA), that recognizes ambiguous situations, in which more than one action have similar probabilities, avoids a random action selection, and uses the human feedback for solving them. The aim is to improve the user experience, the learning performance and safety. LIRA is tested in the selection of the right goal of Movement Primitives (MP) out of a candidate list if multiple contradictory generalizations of the demonstration(s) are possible. The framework is validated on different pick and place operations on a Emika-Franka Robot. A user study showed a significant reduction on the task load of the user, compared to a system that does not allow interactive resolution of ambiguities.Learning & Autonomous Contro
    • …
    corecore