Search CORE

7 research outputs found

Correcting and improving imitation models of humans for Robosoccer agents

Author: Aler Ricardo
García Oscar
Valls José M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Proceeding of: 2005 IEEE Congress on Evolutionary Computation (CEC'05),Edimburgo, 2-5 Sept. 2005The Robosoccer simulator is a challenging environment, where a human introduces a team of agents into a football virtual environment. Typically, agents are programmed by hand, but it would be a great advantage to transfer human experience into football agents. The first aim of this paper is to use machine learning techniques to obtain models of humans playing Robosoccer. These models can be used later to control a Robosoccer agent. However, models did not play as smoothly and optimally as the human. To solve this problem, the second goal of this paper is to incrementally correct models by means of evolutionary techniques, and to adapt them against more difficult opponents than the ones beatable by the human.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Contrastive Learning from Demonstrations

Author: Alexandre Luís A.
Correia André
Publication venue
Publication date: 07/11/2022
Field of study

This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick and place. We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN triplet baseline. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning, and in all cases the results improve when compared to state-of-the-art approaches, with the added benefit of reduced number of training iterations

arXiv.org e-Print Archive

Decision shaping and strategy learning in multi-robot interactions

Author: Valtazanos Aris
Publication venue: The University of Edinburgh
Publication date: 28/11/2013
Field of study

Recent developments in robot technology have contributed to the advancement of autonomous behaviours in human-robot systems; for example, in following instructions received from an interacting human partner. Nevertheless, increasingly many systems are moving towards more seamless forms of interaction, where factors such as implicit trust and persuasion between humans and robots are brought to the fore. In this context, the problem of attaining, through suitable computational models and algorithms, more complex strategic behaviours that can influence human decisions and actions during an interaction, remains largely open. To address this issue, this thesis introduces the problem of decision shaping in strategic interactions between humans and robots, where a robot seeks to lead, without however forcing, an interacting human partner to a particular state. Our approach to this problem is based on a combination of statistical modeling and synthesis of demonstrated behaviours, which enables robots to efficiently adapt to novel interacting agents. We primarily focus on interactions between autonomous and teleoperated (i.e. human-controlled) NAO humanoid robots, using the adversarial soccer penalty shooting game as an illustrative example. We begin by describing the various challenges that a robot operating in such complex interactive environments is likely to face. Then, we introduce a procedure through which composable strategy templates can be learned from provided human demonstrations of interactive behaviours. We subsequently present our primary contribution to the shaping problem, a Bayesian learning framework that empirically models and predicts the responses of an interacting agent, and computes action strategies that are likely to influence that agent towards a desired goal. We then address the related issue of factors affecting human decisions in these interactive strategic environments, such as the availability of perceptual information for the human operator. Finally, we describe an information processing algorithm, based on the Orient motion capture platform, which serves to facilitate direct (as opposed to teleoperation-mediated) strategic interactions between humans and robots. Our experiments introduce and evaluate a wide range of novel autonomous behaviours, where robots are shown to (learn to) influence a variety of interacting agents, ranging from other simple autonomous agents, to robots controlled by experienced human subjects. These results demonstrate the benefits of strategic reasoning in human-robot interaction, and constitute an important step towards realistic, practical applications, where robots are expected to be not just passive agents, but active, influencing participants

Edinburgh Research Archive

Bayesian nonparametric reward learning from demonstration

Author: Michini Bernard (Bernard J.)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 123-132).Learning from demonstration provides an attractive solution to the problem of teaching autonomous systems how to perform complex tasks. Demonstration opens autonomy development to non-experts and is an intuitive means of communication for humans, who naturally use demonstration to teach others. This thesis focuses on a specific form of learning from demonstration, namely inverse reinforcement learning, whereby the reward of the demonstrator is inferred. Formally, inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given knowledge of the transition function and a set of observed demonstrations. While reward learning is a promising method of inferring a rich and transferable representation of the demonstrator's intents, current algorithms suffer from intractability and inefficiency in large, real-world domains. This thesis presents a reward learning framework that infers multiple reward functions from a single, unsegmented demonstration, provides several key approximations which enable scalability to large real-world domains, and generalizes to fully continuous demonstration domains without the need for discretization of the state space, all of which are not handled by previous methods. In the thesis, modifications are proposed to an existing Bayesian IRL algorithm to improve its efficiency and tractability in situations where the state space is large and the demonstrations span only a small portion of it. A modified algorithm is presented and simulation results show substantially faster convergence while maintaining the solution quality of the original method. Even with the proposed efficiency improvements, a key limitation of Bayesian IRL (and most current IRL methods) is the assumption that the demonstrator is maximizing a single reward function. This presents problems when dealing with unsegmented demonstrations containing multiple distinct tasks, common in robot learning from demonstration (e.g. in large tasks that may require multiple subtasks to complete). A key contribution of this thesis is the development of a method that learns multiple reward functions from a single demonstration. The proposed method, termed Bayesian nonparametric inverse reinforcement learning (BNIRL), uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Simulation results demonstrate the ability of BNIRL to handle cyclic tasks that break existing algorithms due to the existence of multiple subgoal rewards in the demonstration. The BNIRL algorithm is easily parallelized, and several approximations to the demonstrator likelihood function are offered to further improve computational tractability in large domains. Since BNIRL is only applicable to discrete domains, the Bayesian nonparametric reward learning framework is extended to general continuous demonstration domains using Gaussian process reward representations. The resulting algorithm, termed Gaussian process subgoal reward learning (GPSRL), is the only learning from demonstration method that is able to learn multiple reward functions from unsegmented demonstration in general continuous domains. GPSRL does not require discretization of the continuous state space and focuses computation efficiently around the demonstration itself. Learned subgoal rewards are cast as Markov decision process options to enable execution of the learned behaviors by the robotic system and provide a principled basis for future learning and skill refinement. Experiments conducted in the MIT RAVEN indoor test facility demonstrate the ability of both BNIRL and GPSRL to learn challenging maneuvers from demonstration on a quadrotor helicopter and a remote-controlled car.by Bernard J. Michini.Ph. D

DSpace@MIT

BNAIC 2008:Proceedings of BNAIC 2008, the twentieth Belgian-Dutch Artificial Intelligence Conference

Author
Publication venue: 'University Library/University of Twente'
Publication date: 15/10/2008
Field of study

University of Twente Research Information

Automating abstraction for potential-based reward shaping

Author: Burden John
Publication venue
Publication date: 01/12/2020
Field of study

Within the field of Reinforcement Learning (RL) the successful application of abstraction can play a huge role in decreasing the time required for agents to learn competent policies. Many examples of this speed-up have been observed throughout the literature. Reward Shaping is one such technique for utilising abstractions in this way. This thesis focuses on how an agent can learn its own abstractions from its own experiences to be used for Potential Based Reward Shaping. As the thesis progresses, the environments for which the abstraction construction is automated grow in complexity and scope --- while also utilising less external knowledge of the domains. This culminates in the approaches \textit{Uniform Property State Abstraction} (UPSA) and \textit{Latent Property State Abstraction} (LPSA), which can both augment existing RL algorithms and allow them to construct abstractions from their own experience and then effectively make use of these abstractions to improve convergence time. Empirical results from this thesis demonstrate that this approach can outperform existing deep RL algorithms such as Deep Q-Networks over a range of domains

White Rose E-theses Online

Correcting and Improving Imitation Models of Humans for Robosoccer Agents

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Crossref