28 research outputs found

    Effective Task Transfer Through Indirect Encoding

    Get PDF
    An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Often approaches to task transfer focus on transforming the original representation to fit the new task. Such representational transformations are necessary because the target task often requires new state information that was not included in the original representation. In RoboCup Keepaway, changing from the 3 vs. 2 variant of the task to 4 vs. 3 adds state information for each of the new players. In contrast, this dissertation explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To this end, (1) the bird’s eye view (BEV) representation is introduced, which can represent different tasks on the same two-dimensional map. Because the BEV represents state information associated with positions instead of objects, it can be scaled to more objects without manipulation. In this way, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV, which is (2) demonstrated in this dissertation. Yet a challenge for such representation is that a raw two-dimensional map is highdimensional and unstructured. This dissertation demonstrates how this problem is addressed naturally by the Hypercube-based NeuroEvolution of Augmenting Topologies (HyperNEAT) approach. HyperNEAT evolves an indirect encoding, which compresses the representation by exploiting its geometry. The dissertation then explores further exploiting the power of such encoding, beginning by (3) enhancing the configuration of the BEV with a focus on iii modularity. The need for further nonlinearity is then (4) investigated through the addition of hidden nodes. Furthermore, (5) the size of the BEV can be manipulated because it is indirectly encoded. Thus the resolution of the BEV, which is dictated by its size, is increased in precision and culminates in a HyperNEAT extension that is expressed at effectively infinite resolution. Additionally, scaling to higher resolutions through gradually increasing the size of the BEV is explored. Finally, (6) the ambitious problem of scaling from the Keepaway task to the Half-field Offense task is investigated with the BEV. Overall, this dissertation demonstrates that advanced representations in conjunction with indirect encoding can contribute to scaling learning techniques to more challenging tasks, such as the Half-field Offense RoboCup soccer domain

    USING COEVOLUTION IN COMPLEX DOMAINS

    Get PDF
    Genetic Algorithms is a computational model inspired by Darwin's theory of evolution. It has a broad range of applications from function optimization to solving robotic control problems. Coevolution is an extension of Genetic Algorithms in which more than one population is evolved at the same time. Coevolution can be done in two ways: cooperatively, in which populations jointly try to solve an evolutionary problem, or competitively. Coevolution has been shown to be useful in solving many problems, yet its application in complex domains still needs to be demonstrated.Robotic soccer is a complex domain that has a dynamic and noisy environment. Many Reinforcement Learning techniques have been applied to the robotic soccer domain, since it is a great test bed for many machine learning methods. However, the success of Reinforcement Learning methods has been limited due to the huge state space of the domain. Evolutionary Algorithms have also been used to tackle this domain; nevertheless, their application has been limited to a small subset of the domain, and no attempt has been shown to be successful in acting on solving the whole problem.This thesis will try to answer the question of whether coevolution can be applied successfully to complex domains. Three techniques are introduced to tackle the robotic soccer problem. First, an incremental learning algorithm is used to achieve a desirable performance of some soccer tasks. Second, a hierarchical coevolution paradigm is introduced to allow coevolution to scale up in solving the problem. Third, an orchestration mechanism is utilized to manage the learning processes

    Applying reinforcement learning in playing Robosoccer using the AIBO

    Get PDF
    "Robosoccer is a popular test bed for AI programs around the world in which AIBO entertainments robots take part in the middle sized soccer event. These robots need a variety of skills to perform in a semi-real environment like this. The three key challenges are manoeuvrability, image recognition and decision making skills. This research is focussed on the decision making skills ... The work focuses on whether reinforcement learning as a form of semi supervised learning can effectively contribute to the goal keeper's decision making when a shot is taken." -Master of Computing (by research

    Humanoid Robot NAO : developing behaviours for soccer humanoid robots

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Development of behaviors for a simulated humanoid robot

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaControlar um robô bípede com vários graus de liberdade é um desafio que recebe a atenção de vários investigadores nas áreas da biologia, física, electrotecnia, ciências de computadores e mecânica. Para que um humanóide possa agir em ambientes complexos, são necessários comportamentos rápidos, estáveis e adaptáveis. Esta dissertação está centrada no desenvolvimento de comportamentos robustos para um robô humanóide simulado, no contexto das competições de futebol robótico simulado 3D do RoboCup, para a equipa FCPortugal3D. Desenvolver tais comportamentos exige o desenvolvimento de métodos de planeamento de trajectórias de juntas e controlo de baixo nível. Controladores PID foram implementados para o controlo de baixo nível. Para o planeamento de trajectórias, quatro métodos foram estudados. O primeiro método apresentado foi implementado antes desta dissertação e consiste numa sequência de funções degrau que definem o ângulo desejado para cada junta durante o movimento. Um novo método baseado na interpolação de um seno foi desenvolvido e consiste em gerar uma trajectória sinusoidal durante um determinado tempo, o que resulta em transições suaves entre o ângulo efectivo e o ângulo desejado para cada junta. Um outro método que foi desenvolvido, baseado em séries parciais de Fourier, gera um padrão cíclico para cada junta, podendo ter múltiplas frequências. Com base no trabalho desenvolvido por Sven Behnke, um CPG para locomoção omnidireccional foi estudado em detalhe e implementado. Uma linguagem de definição de comportamentos é também parte deste estudo e tem como objectivo simplificar a definição de comportamentos utilizando os vários métodos propostos. Integrando o controlo de baixo nível e os métodos de planeamento de trajectórias, vários comportamentos foram criados para permitir a uma versão simulada do humanóide NAO andar em diferentes direcções, rodar, chutar a bola, apanhar a bola (guarda-redes) e levantar do chão. Adicionalmente, a optimização e geração automática de comportamentos foi também estudada, utilizado algoritmos de optimização como o Hill Climbing e Algoritmos Genéticos. No final, os resultados são comparados com as equipas de simulação 3D que reflectem o estado da arte. Os resultados obtidos são bons e foram capazes de ultrapassar uma das três melhores equipas simuladas do RoboCup em diversos aspectos como a velocidade a andar, a velocidade de rotação, a distância da bola depois de chutada, o tempo para apanhar a bola e o tempo para levantar do chão. ABSTRACT: Controlling a biped robot with several degrees of freedom is a challenging task that takes the attention of several researchers in the fields of biology, physics, electronics, computer science and mechanics. For a humanoid robot to perform in complex environments, fast, stable and adaptable behaviors are required. This thesis is concerned with the development of robust behaviors for a simulated humanoid robot, in the scope of the RoboCup 3D Simulated Soccer Competitions, for FCPortugal3D team. Developing such robust behaviors requires the development of methods for joint trajectory planning and low-level control. PID control were implemented to achieve low-level joint control. For trajectory planning, four methods were studied. The first presented method was implemented before this thesis and consists of a sequence of step functions that define the target angle of each joint during the movement. A new method based on the interpolation of a sine function was developed and consists of generating a sinusoidal shape during some amount of time, leading to smooth transitions between the current angle and the target angle of each joint. Another method developed, based on partial Fourier Series, generates a multi-frequency cyclic pattern for each joint. This method is very flexible and allows to completely control the angular positions and velocities of the joints. Based on the work of developed by Sven Behnke, a CPG for omnidirectional locomotion was studied in detail and implemented. A behavior definition language is also part of this study and aims at simplifying the definition of behaviors using the several proposed methods. By integrating the low-level control and the trajectory planning methods, several behaviors were created to allow a simulated version of the humanoid NAO to walk in different directions, turn, kick the ball, catch the ball (goal keeper) and get up from the ground. Furthermore, the automatic generation of gaits, through the use of optimization algorithms such as hill climbing and genetic algorithms, was also studied and tested. In the end, the results are compared with the state of the art teams of the RoboCup 3D simulation league. The achieved results are good and were able to overcome one of the state of the art simulated teams of RoboCup in several aspects such as walking velocity, turning velocity, distance of the ball when kicked, time to catch the ball and the time to get up from the ground

    Complementary Layered Learning

    Get PDF
    Layered learning is a machine learning paradigm used to develop autonomous robotic-based agents by decomposing a complex task into simpler subtasks and learns each sequentially. Although the paradigm continues to have success in multiple domains, performance can be unexpectedly unsatisfactory. Using Boolean-logic problems and autonomous agent navigation, we show poor performance is due to the learner forgetting how to perform earlier learned subtasks too quickly (favoring plasticity) or having difficulty learning new things (favoring stability). We demonstrate that this imbalance can hinder learning so that task performance is no better than that of a suboptimal learning technique, monolithic learning, which does not use decomposition. Through the resulting analyses, we have identified factors that can lead to imbalance and their negative effects, providing a deeper understanding of stability and plasticity in decomposition-based approaches, such as layered learning. To combat the negative effects of the imbalance, a complementary learning system is applied to layered learning. The new technique augments the original learning approach with dual storage region policies to preserve useful information from being removed from an agent’s policy prematurely. Through multi-agent experiments, a 28% task performance increase is obtained with the proposed augmentations over the original technique

    Application of Fuzzy State Aggregation and Policy Hill Climbing to Multi-Agent Systems in Stochastic Environments

    Get PDF
    Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually even as the operating environment changes. Applying this learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing (PHC) and fuzzy state aggregation (FSA) function approximation is tested in two stochastic environments; Tileworld and the robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning lone. Results from the RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through a weighted strategy sharing

    Predicting performance in team games: The automatic coach

    Full text link
    This is an electronic version of the paper presented at the 3rd International Conference on Agents and Artificial Intelligence, held in Rome on 2011A wide range of modern videogames involves a number of players collaborating to obtain a common goal. The way the players are teamed up is usually based on a measure of performance that makes players with a similar level of performance play together. We propose a novel technique based on clustering over observed behaviour in the game that seeks to exploit the particular way of playing of every player to find other players with a gameplay such that in combination will constitute a good team, in a similar way to a human coach. This paper describes the preliminary results using these techniques for the characterization of player and team behaviours. Experiments are performed in the domain of Soccerbots.This work has been partly supported by: Spanish Ministry of Science and Education under grant TIN2009-13692-C03-03, TIN2010-19872 and Spanish Ministry of Industry under grant TSI, 020110- 2009-205

    Learning by observation using Qualitative Spatial Relations

    Get PDF
    We present an approach to the problem of learning by observation in spatially-situated tasks, whereby an agent learns to imitate the behaviour of an observed expert, with no direct interaction and limited observations. The form of knowledge representation used for these observations is crucial, and we apply Qualitative Spatial-Relational representations to compress continuous, metric state-spaces into symbolic states to maximise the generalisability of learned models and minimise knowledge engineering. Our system self-configures these representations of the world to discover configurations of features most relevant to the task, and thus build good predictive models. We then show how these models can be employed by situated agents to control their behaviour, closing the loop from observation to practical implementation. We evaluate our approach in the simulated RoboCup Soccer domain and the Real-Time Strategy game Starcraft, and successfully demonstrate how a system using our approach closely mimics the behaviour of both synthetic (AI controlled) players, and also human-controlled players through observation. We further evaluate our work in Reinforcement Learning tasks in these domains, and show that our approach improves the speed at which such models can be learned
    corecore