19 research outputs found
High level coordination and decision making of a simulated robotic soccer team
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
Learning by observation using Qualitative Spatial Relations
We present an approach to the problem of learning by observation in spatially-situated tasks, whereby an agent learns to imitate the behaviour of an observed expert, with no direct interaction and limited observations. The form of knowledge representation used for these observations is crucial, and we apply Qualitative Spatial-Relational representations to compress continuous, metric state-spaces into symbolic states to maximise the generalisability of learned models and minimise knowledge engineering. Our system self-configures these representations of the world to discover configurations of features most relevant to the task, and thus build good predictive models. We then show how these models can be employed by situated agents to control their behaviour, closing the loop from observation to practical implementation. We evaluate our approach in the simulated RoboCup Soccer domain and the Real-Time Strategy game Starcraft, and successfully demonstrate how a system using our approach closely mimics the behaviour of both synthetic (AI controlled) players, and also human-controlled players through observation. We further evaluate our work in Reinforcement Learning tasks in these domains, and show that our approach improves the speed at which such models can be learned
Recommended from our members
Multilayered skill learning and movement coordination for autonomous robotic agents
With advances in technology expanding the capabilities of robots, while at the same time making robots cheaper to manufacture, robots are rapidly becoming more prevalent in both industrial and domestic settings. An increase in the number of robots, and the likely subsequent decrease in the ratio of people currently trained to directly control the robots, engenders a need for robots to be able to act autonomously. Larger numbers of robots present together provide new challenges and opportunities for developing complex autonomous robot behaviors capable of multirobot collaboration and coordination.
The focus of this thesis is twofold. The first part explores applying machine learning techniques to teach simulated humanoid robots skills such as how to move or walk and manipulate objects in their environment. Learning is performed using reinforcement learning policy search methods, and layered learning methodologies are employed during the learning process in which multiple lower level skills are incrementally learned and combined with each other to develop richer higher level skills. By incrementally learning skills in layers such that new skills are learned in the presence of previously learned skills, as opposed to individually in isolation, we ensure that the learned skills will work well together and can be combined to perform complex behaviors (e.g. playing soccer). The second part of the thesis centers on developing algorithms to coordinate the movement and efforts of multiple robots working together to quickly complete tasks. These algorithms prioritize minimizing the makespan, or time for all robots to complete a task, while also attempting to avoid interference and collisions among the robots. An underlying objective of this research is to develop techniques and methodologies that allow autonomous robots to robustly interact with their environment (through skill learning) and with each other (through movement coordination) in order to perform tasks and accomplish goals asked of them.
The work in this thesis is implemented and evaluated in the RoboCup 3D simulation soccer domain, and has been a key component of the UT Austin Villa team winning the RoboCup 3D simulation league world championship six out of the past seven years.Computer Science
Recommended from our members
Hypernetworks Analysis of RoboCup Interactions
Robotic soccer simulations are controlled environments in which the rich variety of interactions among agents make them good candidates to be studied as complex adaptive systems. The challenge is to create an autonomous team of soccer agents that can adapt and improve its behaviour as it plays other teams. By analogy with chess, the movements of the soccer agents and the ball form ever-changing networks as players in one team form structures that give their team an advantage. For example, the Defender’s Dilemma involves relationships between an attacker with the ball, a team-mate and a defender. The defender must choose between tackling the player with the ball, or taking a position to intercept a pass to the other attacker. Since these structures involve more that two interacting entities it is necessary to go beyond networks to multidimensional hypernetworks. In this context, this thesis investigates (i) is it possible to identify patterns of play, that lead a team to obtain an advantage ?, (ii) is it possible to forecast with a good degree of accuracy if a certain game action or sequence of game actions is going to be successful, before it has been completed ?, and (iii) is it possible to make behavioural patterns emerge in the game without specifying the behavioural rules in detail ? To investigate these research questions we devised two methods to analyse the interactions between robotic players, one based on traditional programming and one based on Deep Learning. The first method identified thousands of Defender’s Dilemma configurations from RoboCup 2D simulator games and found a statistically significant association between winning and the creation of the defender’s dilemma by the attackers of the winning team. The second method showed that a feedforward Artificial Neural Network trained on thousands of games can take as input the current game configuration and forecast to a high degree of accuracy if the current action will end up in a goal or not. Finally, we designed our own fast and simple robotic soccer simulator for investigating Reinforcement Learning. This showed that Reinforcement Learning using Proximal Policy Optimization could train two agents in the task of scoring a goal, using only basic actions without using pre-built hand-programmed skills. These experiments provide evidence that it is possible: to identify advantageous patterns of play; to forecast if an action or sequence of actions will be successful; and to make behavioural patterns emerge in the game without specifying the behavioural rules in detail
A Survey and Analysis of Multi-Robot Coordination
International audienceIn the field of mobile robotics, the study of multi-robot systems (MRSs) has grown significantly in size and importance in recent years. Having made great progress in the development of the basic problems concerning single-robot control, many researchers shifted their focus to the study of multi-robot coordination. This paper presents a systematic survey and analysis of the existing literature on coordination, especially in multiple mobile robot systems (MMRSs). A series of related problems have been reviewed, which include a communication mechanism, a planning strategy and a decision-making structure. A brief conclusion and further research perspectives are given at the end of the paper
Deep learning based approaches for imitation learning.
Imitation learning refers to an agent's ability to mimic a desired behaviour by learning from observations. The field is rapidly gaining attention due to recent advances in computational and communication capabilities as well as rising demand for intelligent applications. The goal of imitation learning is to describe the desired behaviour by providing demonstrations rather than instructions. This enables agents to learn complex behaviours with general learning methods that require minimal task specific information. However, imitation learning faces many challenges. The objective of this thesis is to advance the state of the art in imitation learning by adopting deep learning methods to address two major challenges of learning from demonstrations. Firstly, representing the demonstrations in a manner that is adequate for learning. We propose novel Convolutional Neural Networks (CNN) based methods to automatically extract feature representations from raw visual demonstrations and learn to replicate the demonstrated behaviour. This alleviates the need for task specific feature extraction and provides a general learning process that is adequate for multiple problems. The second challenge is generalizing a policy over unseen situations in the training demonstrations. This is a common problem because demonstrations typically show the best way to perform a task and don't offer any information about recovering from suboptimal actions. Several methods are investigated to improve the agent's generalization ability based on its initial performance. Our contributions in this area are three fold. Firstly, we propose an active data aggregation method that queries the demonstrator in situations of low confidence. Secondly, we investigate combining learning from demonstrations and reinforcement learning. A deep reward shaping method is proposed that learns a potential reward function from demonstrations. Finally, memory architectures in deep neural networks are investigated to provide context to the agent when taking actions. Using recurrent neural networks addresses the dependency between the state-action sequences taken by the agent. The experiments are conducted in simulated environments on 2D and 3D navigation tasks that are learned from raw visual data, as well as a 2D soccer simulator. The proposed methods are compared to state of the art deep reinforcement learning methods. The results show that deep learning architectures can learn suitable representations from raw visual data and effectively map them to atomic actions. The proposed methods for addressing generalization show improvements over using supervised learning and reinforcement learning alone. The results are thoroughly analysed to identify the benefits of each approach and situations in which it is most suitable
Recommended from our members
Learning from action not taken in multiagent systems
Coordination in large multiagent systems in order to achieve a system level goal is a critical challenge. Given the agents' intention to cooperate, there is no guarantee that the agent actions will lead to good system objective especially when the system becomes large. One of the primary difficulties in such
mulitagent systems is the slow learning process. Agents need to learn how to interact with other agents in a complex and dynamic system while adapting in the presence of other agents that are simultaneously learning. Presented in this thesis is a unique multiagent learning approach that significantly improves both
learning speed and system level performance in multiagent systems by having an agent update its estimate of the reward (e.g., value function in reinforcement learning) for all its available actions, not just the action that was taken. This method is based on the agent receiving the reward for the actions they do not take by estimating the counterfactual reward it would have received had it taken those actions. The experimental results illustrate that the rewards on such "actions not taken" are helpful early in the learning process. The agents then use their team members to estimate these rewards resulting in principally learning as a team. Finally, it is shown that fast learning is essential in a dynamic
environment. The ANT reward with teams presents improvement in speed that results in more stability in following the changes in such an environment