438 research outputs found
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies
RoboCup soccer competitions are considered among the most challenging
multi-robot adversarial environments, due to their high dynamism and the
partial observability of the environment. In this paper we introduce a method
based on a combination of Monte Carlo search and data aggregation (MCSDA) to
adapt discrete-action soccer policies for a defender robot to the strategy of
the opponent team. By exploiting a simple representation of the domain, a
supervised learning algorithm is trained over an initial collection of data
consisting of several simulations of human expert policies. Monte Carlo policy
rollouts are then generated and aggregated to previous data to improve the
learned policy over multiple epochs and games. The proposed approach has been
extensively tested both on a soccer-dedicated simulator and on real robots.
Using this method, our learning robot soccer team achieves an improvement in
ball interceptions, as well as a reduction in the number of opponents' goals.
Together with a better performance, an overall more efficient positioning of
the whole team within the field is achieved
High level coordination and decision making of a simulated robotic soccer team
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
Transferring knowledge as heuristics in reinforcement learning: A case-based approach
The goal of this paper is to propose and analyse a transfer learning meta-algorithm that allows the implementation of distinct methods using heuristics to accelerate a Reinforcement Learning procedure in one domain (the target) that are obtained from another (simpler) domain (the source domain). This meta-algorithm works in three stages: first, it uses a Reinforcement Learning step to learn a task on the source domain, storing the knowledge thus obtained in a case base; second, it does an unsupervised mapping of the source-domain actions to the target-domain actions; and, third, the case base obtained in the first stage is used as heuristics to speed up the learning process in the target domain. A set of empirical evaluations were conducted in two target domains: the 3D mountain car (using a learned case base from a 2D simulation) and stability learning for a humanoid robot in the Robocup 3D Soccer Simulator (that uses knowledge learned from the Acrobot domain). The results attest that our transfer learning algorithm outperforms recent heuristically-accelerated reinforcement learning and transfer learning algorithms. © 2015 Elsevier B.V.Luiz Celiberto Jr. and Reinaldo Bianchi acknowledge the support of FAPESP (grants 2012/14010-5 and 2011/19280-8).
Paulo E. Santos acknowledges support from FAPESP (grant 2012/04089-3) and CNPq (grant PQ2 -303331/2011-9).Peer Reviewe
Development of behaviors for a simulated humanoid robot
Mestrado em Engenharia de Computadores e TelemáticaControlar um robô bÃpede com vários graus de liberdade é um desafio que recebe a atenção de vários investigadores nas áreas da biologia, fÃsica,
electrotecnia, ciências de computadores e mecânica. Para que um humanóide possa agir em ambientes complexos, são necessários comportamentos
rápidos, estáveis e adaptáveis. Esta dissertação está centrada no desenvolvimento de comportamentos robustos para um robô humanóide
simulado, no contexto das competições de futebol robótico simulado 3D do RoboCup, para a equipa FCPortugal3D. Desenvolver tais comportamentos
exige o desenvolvimento de métodos de planeamento de trajectórias de juntas e controlo de baixo nÃvel. Controladores PID foram implementados para o controlo de baixo nÃvel. Para o planeamento de trajectórias, quatro métodos foram estudados. O primeiro método apresentado foi implementado antes desta dissertação e consiste numa sequência de funções degrau
que definem o ângulo desejado para cada junta durante o movimento. Um novo método baseado na interpolação de um seno foi desenvolvido e consiste em gerar uma trajectória sinusoidal durante um determinado tempo, o que resulta em transições suaves entre o ângulo efectivo e o ângulo desejado para cada junta. Um outro método que foi desenvolvido, baseado em séries parciais de Fourier, gera um padrão cÃclico para cada junta, podendo ter múltiplas frequências. Com base no trabalho desenvolvido por Sven Behnke, um CPG para locomoção omnidireccional foi estudado em
detalhe e implementado. Uma linguagem de definição de comportamentos é também parte deste estudo e tem como objectivo simplificar a definição de comportamentos utilizando os vários métodos propostos. Integrando o controlo de baixo nÃvel e os métodos de planeamento de trajectórias, vários comportamentos foram criados para permitir a uma versão simulada do humanóide NAO andar em diferentes direcções, rodar, chutar a bola, apanhar a bola (guarda-redes) e levantar do chão. Adicionalmente, a optimização e geração automática de comportamentos foi também estudada, utilizado algoritmos de optimização como o Hill Climbing e Algoritmos Genéticos.
No final, os resultados são comparados com as equipas de simulação 3D que reflectem o estado da arte. Os resultados obtidos são bons e foram capazes de ultrapassar uma das três melhores equipas simuladas do RoboCup em diversos aspectos como a velocidade a andar, a velocidade de rotação, a distância da bola depois de chutada, o tempo para apanhar a bola e o tempo para levantar do chão.
ABSTRACT: Controlling a biped robot with several degrees of freedom is a challenging task that takes the attention of several researchers in the fields of biology, physics, electronics, computer science and mechanics. For a humanoid robot to perform in complex environments, fast, stable and adaptable behaviors are required. This thesis is concerned with the development of robust behaviors
for a simulated humanoid robot, in the scope of the RoboCup 3D Simulated Soccer Competitions, for FCPortugal3D team. Developing such
robust behaviors requires the development of methods for joint trajectory planning and low-level control. PID control were implemented to achieve
low-level joint control. For trajectory planning, four methods were studied.
The first presented method was implemented before this thesis and consists of a sequence of step functions that define the target angle of each joint
during the movement. A new method based on the interpolation of a sine function was developed and consists of generating a sinusoidal shape during
some amount of time, leading to smooth transitions between the current angle and the target angle of each joint. Another method developed, based
on partial Fourier Series, generates a multi-frequency cyclic pattern for each joint. This method is very flexible and allows to completely control the angular positions and velocities of the joints. Based on the work of developed by Sven Behnke, a CPG for omnidirectional locomotion was studied in detail and implemented. A behavior definition language is also part of this study and aims at simplifying the definition of behaviors using the several proposed methods. By integrating the low-level control and the trajectory planning methods, several behaviors were created to allow a simulated version of the humanoid NAO to walk in different directions, turn, kick the ball, catch the ball (goal keeper) and get up from the ground. Furthermore, the automatic generation of gaits, through the use of optimization algorithms
such as hill climbing and genetic algorithms, was also studied and tested.
In the end, the results are compared with the state of the art teams of the RoboCup 3D simulation league. The achieved results are good and were
able to overcome one of the state of the art simulated teams of RoboCup in several aspects such as walking velocity, turning velocity, distance of the ball when kicked, time to catch the ball and the time to get up from the ground
The StarCraft Multi-Agent Challenge
In the last few years, deep multi-agent reinforcement learning (RL) has
become a highly active area of research. A particularly challenging class of
problems in this area is partially observable, cooperative, multi-agent
learning, in which teams of agents must learn to coordinate their behaviour
while conditioning only on their private observations. This is an attractive
research area since such problems are relevant to a large number of real-world
systems and are also more amenable to evaluation than general-sum problems.
Standardised environments such as the ALE and MuJoCo have allowed single-agent
RL to move beyond toy domains, such as grid worlds. However, there is no
comparable benchmark for cooperative multi-agent RL. As a result, most papers
in this field use one-off toy problems, making it difficult to measure real
progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC)
as a benchmark problem to fill this gap. SMAC is based on the popular real-time
strategy game StarCraft II and focuses on micromanagement challenges where each
unit is controlled by an independent agent that must act based on local
observations. We offer a diverse set of challenge maps and recommendations for
best practices in benchmarking and evaluations. We also open-source a deep
multi-agent RL learning framework including state-of-the-art algorithms. We
believe that SMAC can provide a standard benchmark environment for years to
come. Videos of our best agents for several SMAC scenarios are available at:
https://youtu.be/VZ7zmQ_obZ0
Recommended from our members
Keyframe Sampling, Optimization, and Behavior Integration: A New Longest Kick in the RoboCup 3D Simulation League
Even with improvements in machine learning enabling robots to
quickly optimize and perfect their skills, developing a seed skill from
which to begin an optimization remains a necessary challenge for large
action spaces. This thesis proposes a method for creating and using
such a seed by i) observing the effects of the actions of another robot,
ii) further optimizing the skill starting from this seed, and iii) em-
bedding the optimized skill in a full behavior. Called KSOBI, this
method is fully implemented and tested in the complex RoboCup 3D
simulation domain. The main result is a kick that, to the best of
our knowledge, kicks the ball farther in this simulator than has been
previously documented.Computer Science
A conceptual framework for externally-influenced agents: an assisted reinforcement learning review
A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent’s performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature
- …