285 research outputs found

    Deep Reinforcement Learning Approaches for the Game of Briscola

    Get PDF
    openReinforcement learning is increasingly becoming one of the most interesting areas of research in recent years. It is a machine learning approach that aims to design autonomous agents capable of learning from interaction with the envi- ronment, similar to how a human does. This peculiarity makes it particularly suitable for sequential decision making problems such as games. Indeed games are a perfect testing ground for reinforcement learning agents, due to a con- trolled environment, challenging tasks and a clear objective. Recent advances in deep learning allowed reinforcement learning algorithms to exceed human level performance in multiple games, the most notorious example being AlphaGo. In this thesis work we will apply deep reinforcement learning methods to Briscola, one of the most popular card games in Italy. After formalizing the two-player Briscola as a RL problem, we will apply two algorithms: Deep Q-learning and Proximal Policy Optimization. The agents will be trained against a random agent and an agent with predefined moves. The win rate will be used as a performance measure to compare the final results.Reinforcement learning is increasingly becoming one of the most interesting areas of research in recent years. It is a machine learning approach that aims to design autonomous agents capable of learning from interaction with the envi- ronment, similar to how a human does. This peculiarity makes it particularly suitable for sequential decision making problems such as games. Indeed games are a perfect testing ground for reinforcement learning agents, due to a con- trolled environment, challenging tasks and a clear objective. Recent advances in deep learning allowed reinforcement learning algorithms to exceed human level performance in multiple games, the most notorious example being AlphaGo. In this thesis work we will apply deep reinforcement learning methods to Briscola, one of the most popular card games in Italy. After formalizing the two-player Briscola as a RL problem, we will apply two algorithms: Deep Q-learning and Proximal Policy Optimization. The agents will be trained against a random agent and an agent with predefined moves. The win rate will be used as a performance measure to compare the final results

    ANALYZING HUMAN-INDUCED PATHOLOGY IN THE TRAINING OF REINFORCEMENT LEARNING ALGORITHMS

    Get PDF
    Modern artificial intelligence (AI) systems trained with reinforcement learning (RL) are increasingly more capable, but agents training to complete tasks in safety critical environments still require millions of trial-and-error training steps. Previous research with a Pong agent has shown that some human heuristics initially accelerate training but cause agent performance to regress to a state of performance collapse. This thesis utilizes the FlappyBird environment to evaluate if the pathology is generalizable. After initially confirming a similar pathology in an unaided agent, comprehensive experimentation was performed with optimizers, weight initialization methods, activation functions, and varied hyperparameters. The pathology persisted across all experiments and the results show the network architecture is likely the principal cause. At a high level, this work illustrates the importance of determining the inherent capacity of an architecture to learn and model complex environments and how more systematic methods to quantify capacity would greatly enhance RL.Outstanding ThesisCaptain, United States Marine CorpsApproved for public release. Distribution is unlimited

    Influencing Exploration in Actor-Critic Reinforcement Learning Algorithms

    Get PDF
    Reinforcement Learning (RL) is a subset of machine learning primarily concerned with goal-directed learning and optimal decision making. RL agents learn based on a reward signal discovered from trial and error in complex, uncertain environments with the goal of maximizing positive reward signals. RL approaches need to scale up as they are applied to more complex environments with extremely large state spaces. Inefficient exploration methods cannot sufficiently explore complex environments in a reasonable amount of time, and optimal policies will be unrealized resulting in RL agents failing to solve an environment. This thesis proposes a novel variant of the Actor-Advantage Critic (A2C) algorithm. The variant is validated against two state-of-the-art RL algorithms, Deep Q-Network (DQN) and A2C, across six Atari 2600 games of varying difficulty. The experimental results are competitive with state-of-the-art and achieve lower variance and quicker learning speed. Additionally, the thesis introduces a metric to objectively quantify the difficulty of any Markovian environment with respect to the exploratory capacity of RL agents

    Wisdom of the crowd, The: reliable deep reinforcement learning through ensembles of Q-functions

    Get PDF
    2018 Summer.Includes bibliographical references.Reinforcement learning agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is reinforcement learning can feel too slow and unstable during learning: exhibiting performance like that of a randomly initialized Q-function just a few parameter updates after solving the task. We explore the possibility that ensemble methods can remedy these shortcomings and do so by investigating a novel technique which harnesses the wisdom of the crowds by bagging Q-function approximator estimates. Our results show that this proposed approach improves all tasks and reinforcement learning approaches attempted. We are able to demonstrate that this is a direct result of the increased stability of the action portion of the state-action-value function used by Q-learning to select actions and by policy gradient methods to train the policy. Recently developed methods attempt to solve these RL challenges at the cost of increasing the number of interactions with the environment by several orders of magnitude. On the other hand, the proposed approach has little downside for inclusion: it addresses RL challenges while reducing the number interactions with the environment

    MARBLER: An Open Platform for Standarized Evaluation of Multi-Robot Reinforcement Learning Algorithms

    Full text link
    Multi-agent reinforcement learning (MARL) has enjoyed significant recent progress, thanks to deep learning. This is naturally starting to benefit multi-robot systems (MRS) in the form of multi-robot RL (MRRL). However, existing infrastructure to train and evaluate policies predominantly focus on challenges in coordinating virtual agents, and ignore characteristics important to robotic systems. Few platforms support realistic robot dynamics, and fewer still can evaluate Sim2Real performance of learned behavior. To address these issues, we contribute MARBLER: Multi-Agent RL Benchmark and Learning Environment for the Robotarium. MARBLER offers a robust and comprehensive evaluation platform for MRRL by marrying Georgia Tech's Robotarium (which enables rapid prototyping on physical MRS) and OpenAI's Gym framework (which facilitates standardized use of modern learning algorithms). MARBLER offers a highly controllable environment with realistic dynamics, including barrier certificate-based obstacle avoidance. It allows anyone across the world to train and deploy MRRL algorithms on a physical testbed with reproducibility. Further, we introduce five novel scenarios inspired by common challenges in MRS and provide support for new custom scenarios. Finally, we use MARBLER to evaluate popular MARL algorithms and provide insights into their suitability for MRRL. In summary, MARBLER can be a valuable tool to the MRS research community by facilitating comprehensive and standardized evaluation of learning algorithms on realistic simulations and physical hardware. Links to our open-source framework and the videos of real-world experiments can be found at https://shubhlohiya.github.io/MARBLER/.Comment: 7 pages, 3 figures, submitted to MRS 2023, for the associated website, see https://shubhlohiya.github.io/MARBLER

    Modern applications of machine learning in quantum sciences

    Get PDF
    In these Lecture Notes, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning
    • …
    corecore