Search CORE

285 research outputs found

Deep Reinforcement Learning Approaches for the Game of Briscola

Author: SINGH AMANPREET
Publication venue
Publication date: 14/04/2023
Field of study

openReinforcement learning is increasingly becoming one of the most interesting areas of research in recent years. It is a machine learning approach that aims to design autonomous agents capable of learning from interaction with the envi- ronment, similar to how a human does. This peculiarity makes it particularly suitable for sequential decision making problems such as games. Indeed games are a perfect testing ground for reinforcement learning agents, due to a con- trolled environment, challenging tasks and a clear objective. Recent advances in deep learning allowed reinforcement learning algorithms to exceed human level performance in multiple games, the most notorious example being AlphaGo. In this thesis work we will apply deep reinforcement learning methods to Briscola, one of the most popular card games in Italy. After formalizing the two-player Briscola as a RL problem, we will apply two algorithms: Deep Q-learning and Proximal Policy Optimization. The agents will be trained against a random agent and an agent with predefined moves. The win rate will be used as a performance measure to compare the final results.Reinforcement learning is increasingly becoming one of the most interesting areas of research in recent years. It is a machine learning approach that aims to design autonomous agents capable of learning from interaction with the envi- ronment, similar to how a human does. This peculiarity makes it particularly suitable for sequential decision making problems such as games. Indeed games are a perfect testing ground for reinforcement learning agents, due to a con- trolled environment, challenging tasks and a clear objective. Recent advances in deep learning allowed reinforcement learning algorithms to exceed human level performance in multiple games, the most notorious example being AlphaGo. In this thesis work we will apply deep reinforcement learning methods to Briscola, one of the most popular card games in Italy. After formalizing the two-player Briscola as a RL problem, we will apply two algorithms: Deep Q-learning and Proximal Policy Optimization. The agents will be trained against a random agent and an agent with predefined moves. The win rate will be used as a performance measure to compare the final results

Padua Thesis and Dissertation Archive

ANALYZING HUMAN-INDUCED PATHOLOGY IN THE TRAINING OF REINFORCEMENT LEARNING ALGORITHMS

Author: Atkinson Brian R.
Publication venue: Monterey, CA; Naval Postgraduate School
Publication date: 01/09/2022
Field of study

Modern artificial intelligence (AI) systems trained with reinforcement learning (RL) are increasingly more capable, but agents training to complete tasks in safety critical environments still require millions of trial-and-error training steps. Previous research with a Pong agent has shown that some human heuristics initially accelerate training but cause agent performance to regress to a state of performance collapse. This thesis utilizes the FlappyBird environment to evaluate if the pathology is generalizable. After initially confirming a similar pathology in an unaided agent, comprehensive experimentation was performed with optimizers, weight initialization methods, activation functions, and varied hyperparameters. The pathology persisted across all experiments and the results show the network architecture is likely the principal cause. At a high level, this work illustrates the importance of determining the inherent capacity of an architecture to learn and model complex environments and how more systematic methods to quantify capacity would greatly enhance RL.Outstanding ThesisCaptain, United States Marine CorpsApproved for public release. Distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

Influencing Exploration in Actor-Critic Reinforcement Learning Algorithms

Author: Gough Andrew R
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2018
Field of study

Reinforcement Learning (RL) is a subset of machine learning primarily concerned with goal-directed learning and optimal decision making. RL agents learn based on a reward signal discovered from trial and error in complex, uncertain environments with the goal of maximizing positive reward signals. RL approaches need to scale up as they are applied to more complex environments with extremely large state spaces. Inefficient exploration methods cannot sufficiently explore complex environments in a reasonable amount of time, and optimal policies will be unrealized resulting in RL agents failing to solve an environment. This thesis proposes a novel variant of the Actor-Advantage Critic (A2C) algorithm. The variant is validated against two state-of-the-art RL algorithms, Deep Q-Network (DQN) and A2C, across six Atari 2600 games of varying difficulty. The experimental results are competitive with state-of-the-art and achieve lower variance and quicker learning speed. Additionally, the thesis introduces a metric to objectively quantify the difficulty of any Markovian environment with respect to the exploratory capacity of RL agents

DigitalCommons@CalPoly

Recommended from our members

Learning from Sequential User Data: Models and Sample-efficient Algorithms

Author
Publication venue: ScholarWorks@UMass Amherst
Publication date: 03/04/2023
Field of study

Recent advances in deep learning have made learning representation from ever-growing datasets possible in the domain of vision, natural language processing (NLP), and robotics, among others. However, deep networks are notoriously data-hungry; for example, training language models with attention mechanisms sometimes requires trillions of parameters and tokens. In contrast, we can often access a limited number of samples in many tasks. It is crucial to learn models from these `limited\u27 datasets. Learning with limited datasets can take several forms. In this thesis, we study how to select data samples sequentially such that downstream task performance is maximized. Moreover, we study how to introduce prior knowledge in the deep networks to maximize prediction performance. We focus on four sequential tasks: computerized adaptive testing in psychometrics, sketching in recommender systems, knowledge tracing in computer-assisted education, and career path modeling in the labor market. In the first two tasks, we devise novel sample-efficient algorithms to query a minimal number of sequential samples to improve future predictions. We propose a Bilevel Optimization-Based framework for computerized adaptive testing to learn a data-driven question selection algorithm that improves existing data selection policies. We also tackle the sketching problem in the recommender system, with the task of recommending the next item using a stored subset of prior data samples. In this setting, we develop a data-driven sequential selection algorithm that tackles evolving downstream task distribution. In the last two tasks, we devise novel neural models to introduce prior knowledge exploiting limited data samples. For knowledge tracing, we propose a novel neural architecture, inspired by cognitive and psychometric models, to improve the prediction of students\u27 future performance and utilize the labeled data samples efficiently. For career path modeling, we propose a novel and interpretable monotonic nonlinear state-space model to analyze online user professional profiles and provide actionable feedback and recommendations to users on how they can reach their career goals. The data-driven differentiable data selection algorithms for the first two tasks open up future directions to query (a non-differentiable operation) a minimal number of samples optimally to maximize prediction performance. The structures, introduced in the neural architecture for the models in the last two tasks using prior knowledge, open up future directions to learn deep models augmented with prior knowledge using limited data samples

ScholarWorks@UMass Amherst

Wisdom of the crowd, The: reliable deep reinforcement learning through ensembles of Q-functions

Author: Elliott Daniel L.
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2018
Field of study

2018 Summer.Includes bibliographical references.Reinforcement learning agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is reinforcement learning can feel too slow and unstable during learning: exhibiting performance like that of a randomly initialized Q-function just a few parameter updates after solving the task. We explore the possibility that ensemble methods can remedy these shortcomings and do so by investigating a novel technique which harnesses the wisdom of the crowds by bagging Q-function approximator estimates. Our results show that this proposed approach improves all tasks and reinforcement learning approaches attempted. We are able to demonstrate that this is a direct result of the increased stability of the action portion of the state-action-value function used by Q-learning to select actions and by policy gradient methods to train the policy. Recently developed methods attempt to solve these RL challenges at the cost of increasing the number of interactions with the environment by several orders of magnitude. On the other hand, the proposed approach has little downside for inclusion: it addresses RL challenges while reducing the number interactions with the environment

Mountain Scholar (Digital Collections of Colorado and Wyoming)

MARBLER: An Open Platform for Standarized Evaluation of Multi-Robot Reinforcement Learning Algorithms

Author: Lohiya Shubham
Nigam Meher S.
Ravichandar Harish
Singh Shivika
Torbati Reza
Publication venue
Publication date: 10/07/2023
Field of study

Multi-agent reinforcement learning (MARL) has enjoyed significant recent progress, thanks to deep learning. This is naturally starting to benefit multi-robot systems (MRS) in the form of multi-robot RL (MRRL). However, existing infrastructure to train and evaluate policies predominantly focus on challenges in coordinating virtual agents, and ignore characteristics important to robotic systems. Few platforms support realistic robot dynamics, and fewer still can evaluate Sim2Real performance of learned behavior. To address these issues, we contribute MARBLER: Multi-Agent RL Benchmark and Learning Environment for the Robotarium. MARBLER offers a robust and comprehensive evaluation platform for MRRL by marrying Georgia Tech's Robotarium (which enables rapid prototyping on physical MRS) and OpenAI's Gym framework (which facilitates standardized use of modern learning algorithms). MARBLER offers a highly controllable environment with realistic dynamics, including barrier certificate-based obstacle avoidance. It allows anyone across the world to train and deploy MRRL algorithms on a physical testbed with reproducibility. Further, we introduce five novel scenarios inspired by common challenges in MRS and provide support for new custom scenarios. Finally, we use MARBLER to evaluate popular MARL algorithms and provide insights into their suitability for MRRL. In summary, MARBLER can be a valuable tool to the MRS research community by facilitating comprehensive and standardized evaluation of learning algorithms on realistic simulations and physical hardware. Links to our open-source framework and the videos of real-world experiments can be found at https://shubhlohiya.github.io/MARBLER/.Comment: 7 pages, 3 figures, submitted to MRS 2023, for the associated website, see https://shubhlohiya.github.io/MARBLER

arXiv.org e-Print Archive

Recommended from our members

End-to-end deep reinforcement learning in computer systems

Author: Schaarschmidt Michael
Publication venue: University of Cambridge
Publication date: 13/04/2020
Field of study

Abstract The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications. In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping. Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications

Apollo (Cambridge)

Modern applications of machine learning in quantum sciences

Author: Arnold J.
Büttner M.
Carleo G.
Carrasquilla J.
Cervera-Lierta A.
Dauphin A.
Dawid A.
Donatella K.
Dunjko V.
Gabrié M.
Greplová E.
Gresch A.
Huembeli P.
Koch R.
Krems R.
Lewenstein M.
Marquardt F.
Muñoz-Gil G.
Nicoli K.
Okuła R.
Płodzień M.
Requena B.
Stornati P.
Tomza M.
van Nieuwenburg E.
Vargas-Hernández R.
Vicentini F.
Wang L.
Wetzel S.
Publication venue
Publication date: 08/04/2022
Field of study

In these Lecture Notes, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning

MPG.PuRe