48 research outputs found

    Argumentation accelerated reinforcement learning

    Get PDF
    Reinforcement Learning (RL) is a popular statistical Artificial Intelligence (AI) technique for building autonomous agents, but it suffers from the curse of dimensionality: the computational requirement for obtaining the optimal policies grows exponentially with the size of the state space. Integrating heuristics into RL has proven to be an effective approach to combat this curse, but deriving high-quality heuristics from people’s (typically conflicting) domain knowledge is challenging, yet it received little research attention. Argumentation theory is a logic-based AI technique well-known for its conflict resolution capability and intuitive appeal. In this thesis, we investigate the integration of argumentation frameworks into RL algorithms, so as to improve the convergence speed of RL algorithms. In particular, we propose a variant of Value-based Argumentation Framework (VAF) to represent domain knowledge and to derive heuristics from this knowledge. We prove that the heuristics derived from this framework can effectively instruct individual learning agents as well as multiple cooperative learning agents. In addition,we propose the Argumentation Accelerated RL (AARL) framework to integrate these heuristics into different RL algorithms via Potential Based Reward Shaping (PBRS) techniques: we use classical PBRS techniques for flat RL (e.g. SARSA(λ)) based AARL, and propose a novel PBRS technique for MAXQ-0, a hierarchical RL (HRL) algorithm, so as to implement HRL based AARL. We empirically test two AARL implementations — SARSA(λ)-based AARL and MAXQ-based AARL — in multiple application domains, including single-agent and multi-agent learning problems. Empirical results indicate that AARL can improve the convergence speed of RL, and can also be easily used by people that have little background in Argumentation and RL.Open Acces

    Reinforcement Learning from Demonstration

    Get PDF
    Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent\u27s own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently

    Learning by observation using Qualitative Spatial Relations

    Get PDF
    We present an approach to the problem of learning by observation in spatially-situated tasks, whereby an agent learns to imitate the behaviour of an observed expert, with no direct interaction and limited observations. The form of knowledge representation used for these observations is crucial, and we apply Qualitative Spatial-Relational representations to compress continuous, metric state-spaces into symbolic states to maximise the generalisability of learned models and minimise knowledge engineering. Our system self-configures these representations of the world to discover configurations of features most relevant to the task, and thus build good predictive models. We then show how these models can be employed by situated agents to control their behaviour, closing the loop from observation to practical implementation. We evaluate our approach in the simulated RoboCup Soccer domain and the Real-Time Strategy game Starcraft, and successfully demonstrate how a system using our approach closely mimics the behaviour of both synthetic (AI controlled) players, and also human-controlled players through observation. We further evaluate our work in Reinforcement Learning tasks in these domains, and show that our approach improves the speed at which such models can be learned

    サッカータスクの深層強化学習における段階的な協調行動の獲得

    Get PDF
    本研究では,サッカータスクでの協調行動を獲得するため,段階的に学習を行うアプローチであるカリキュラム学習を用いて研究を行った.サッカータスクは報酬がスパースなタスクであるため,どんな行動が報酬や罰につながるかを明確にすることが難しく,状態の多さや行動の複雑さから学習が困難になる.そのため,複雑なタスクであるサッカータスクにおいて,サッカータスクに適したアプローチを用いる必要がある.本研究では,簡単なタスクから学習を始め,徐々に難しいタスクを学習させるカリキュラム学習で協調行動獲得の学習の効率化を目指した.これまでのサッカータスクにおけるカリキュラム学習は協調行動の獲得に関して研究が行われてこなかった.カリキュラム学習で協調行動を学習させるために,本稿では人が行うサッカーの練習に似せて,コーンのような障害物を敵に見立てることや段階的に敵のエージェントを増やすことで,タスクを難しくしてカリキュラム学習を行った.実験ではカリキュラム学習を促すために行ったReward Shapingの効果をみるReward Shapingにおける実験と,カリキュラム学習の効果をみる実験を行った.前者では,2体のエージェントの協調行動が必要な環境で実験を行い,Reward Shapingを行った場合が行わなかった場合を目標達成率で上回った.後者では,シュートチャンスでの2体のエージェントの協調行動を試みた学習を行い,カリキュラム学習をした場合がカリキュラム学習をしなかった場合に比べて,目標達成率を上回ることを示した.電気通信大学202

    Deep learning for video game playing

    Get PDF
    In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards
    corecore