438 research outputs found

    Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies

    Full text link
    RoboCup soccer competitions are considered among the most challenging multi-robot adversarial environments, due to their high dynamism and the partial observability of the environment. In this paper we introduce a method based on a combination of Monte Carlo search and data aggregation (MCSDA) to adapt discrete-action soccer policies for a defender robot to the strategy of the opponent team. By exploiting a simple representation of the domain, a supervised learning algorithm is trained over an initial collection of data consisting of several simulations of human expert policies. Monte Carlo policy rollouts are then generated and aggregated to previous data to improve the learned policy over multiple epochs and games. The proposed approach has been extensively tested both on a soccer-dedicated simulator and on real robots. Using this method, our learning robot soccer team achieves an improvement in ball interceptions, as well as a reduction in the number of opponents' goals. Together with a better performance, an overall more efficient positioning of the whole team within the field is achieved

    High level coordination and decision making of a simulated robotic soccer team

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Transferring knowledge as heuristics in reinforcement learning: A case-based approach

    Get PDF
    The goal of this paper is to propose and analyse a transfer learning meta-algorithm that allows the implementation of distinct methods using heuristics to accelerate a Reinforcement Learning procedure in one domain (the target) that are obtained from another (simpler) domain (the source domain). This meta-algorithm works in three stages: first, it uses a Reinforcement Learning step to learn a task on the source domain, storing the knowledge thus obtained in a case base; second, it does an unsupervised mapping of the source-domain actions to the target-domain actions; and, third, the case base obtained in the first stage is used as heuristics to speed up the learning process in the target domain. A set of empirical evaluations were conducted in two target domains: the 3D mountain car (using a learned case base from a 2D simulation) and stability learning for a humanoid robot in the Robocup 3D Soccer Simulator (that uses knowledge learned from the Acrobot domain). The results attest that our transfer learning algorithm outperforms recent heuristically-accelerated reinforcement learning and transfer learning algorithms. © 2015 Elsevier B.V.Luiz Celiberto Jr. and Reinaldo Bianchi acknowledge the support of FAPESP (grants 2012/14010-5 and 2011/19280-8). Paulo E. Santos acknowledges support from FAPESP (grant 2012/04089-3) and CNPq (grant PQ2 -303331/2011-9).Peer Reviewe

    Multiagent reactive plan application learning in dynamic environments

    Get PDF

    Development of behaviors for a simulated humanoid robot

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaControlar um robô bípede com vários graus de liberdade é um desafio que recebe a atenção de vários investigadores nas áreas da biologia, física, electrotecnia, ciências de computadores e mecânica. Para que um humanóide possa agir em ambientes complexos, são necessários comportamentos rápidos, estáveis e adaptáveis. Esta dissertação está centrada no desenvolvimento de comportamentos robustos para um robô humanóide simulado, no contexto das competições de futebol robótico simulado 3D do RoboCup, para a equipa FCPortugal3D. Desenvolver tais comportamentos exige o desenvolvimento de métodos de planeamento de trajectórias de juntas e controlo de baixo nível. Controladores PID foram implementados para o controlo de baixo nível. Para o planeamento de trajectórias, quatro métodos foram estudados. O primeiro método apresentado foi implementado antes desta dissertação e consiste numa sequência de funções degrau que definem o ângulo desejado para cada junta durante o movimento. Um novo método baseado na interpolação de um seno foi desenvolvido e consiste em gerar uma trajectória sinusoidal durante um determinado tempo, o que resulta em transições suaves entre o ângulo efectivo e o ângulo desejado para cada junta. Um outro método que foi desenvolvido, baseado em séries parciais de Fourier, gera um padrão cíclico para cada junta, podendo ter múltiplas frequências. Com base no trabalho desenvolvido por Sven Behnke, um CPG para locomoção omnidireccional foi estudado em detalhe e implementado. Uma linguagem de definição de comportamentos é também parte deste estudo e tem como objectivo simplificar a definição de comportamentos utilizando os vários métodos propostos. Integrando o controlo de baixo nível e os métodos de planeamento de trajectórias, vários comportamentos foram criados para permitir a uma versão simulada do humanóide NAO andar em diferentes direcções, rodar, chutar a bola, apanhar a bola (guarda-redes) e levantar do chão. Adicionalmente, a optimização e geração automática de comportamentos foi também estudada, utilizado algoritmos de optimização como o Hill Climbing e Algoritmos Genéticos. No final, os resultados são comparados com as equipas de simulação 3D que reflectem o estado da arte. Os resultados obtidos são bons e foram capazes de ultrapassar uma das três melhores equipas simuladas do RoboCup em diversos aspectos como a velocidade a andar, a velocidade de rotação, a distância da bola depois de chutada, o tempo para apanhar a bola e o tempo para levantar do chão. ABSTRACT: Controlling a biped robot with several degrees of freedom is a challenging task that takes the attention of several researchers in the fields of biology, physics, electronics, computer science and mechanics. For a humanoid robot to perform in complex environments, fast, stable and adaptable behaviors are required. This thesis is concerned with the development of robust behaviors for a simulated humanoid robot, in the scope of the RoboCup 3D Simulated Soccer Competitions, for FCPortugal3D team. Developing such robust behaviors requires the development of methods for joint trajectory planning and low-level control. PID control were implemented to achieve low-level joint control. For trajectory planning, four methods were studied. The first presented method was implemented before this thesis and consists of a sequence of step functions that define the target angle of each joint during the movement. A new method based on the interpolation of a sine function was developed and consists of generating a sinusoidal shape during some amount of time, leading to smooth transitions between the current angle and the target angle of each joint. Another method developed, based on partial Fourier Series, generates a multi-frequency cyclic pattern for each joint. This method is very flexible and allows to completely control the angular positions and velocities of the joints. Based on the work of developed by Sven Behnke, a CPG for omnidirectional locomotion was studied in detail and implemented. A behavior definition language is also part of this study and aims at simplifying the definition of behaviors using the several proposed methods. By integrating the low-level control and the trajectory planning methods, several behaviors were created to allow a simulated version of the humanoid NAO to walk in different directions, turn, kick the ball, catch the ball (goal keeper) and get up from the ground. Furthermore, the automatic generation of gaits, through the use of optimization algorithms such as hill climbing and genetic algorithms, was also studied and tested. In the end, the results are compared with the state of the art teams of the RoboCup 3D simulation league. The achieved results are good and were able to overcome one of the state of the art simulated teams of RoboCup in several aspects such as walking velocity, turning velocity, distance of the ball when kicked, time to catch the ball and the time to get up from the ground

    The StarCraft Multi-Agent Challenge

    Full text link
    In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0

    A conceptual framework for externally-influenced agents: an assisted reinforcement learning review

    Get PDF
    A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature
    • …
    corecore