285 research outputs found
Deep Reinforcement Learning Approaches for the Game of Briscola
openReinforcement learning is increasingly becoming one of the most interesting
areas of research in recent years. It is a machine learning approach that aims to
design autonomous agents capable of learning from interaction with the envi-
ronment, similar to how a human does. This peculiarity makes it particularly
suitable for sequential decision making problems such as games. Indeed games
are a perfect testing ground for reinforcement learning agents, due to a con-
trolled environment, challenging tasks and a clear objective. Recent advances in
deep learning allowed reinforcement learning algorithms to exceed human level
performance in multiple games, the most notorious example being AlphaGo. In
this thesis work we will apply deep reinforcement learning methods to Briscola,
one of the most popular card games in Italy. After formalizing the two-player
Briscola as a RL problem, we will apply two algorithms: Deep Q-learning and
Proximal Policy Optimization. The agents will be trained against a random
agent and an agent with predefined moves. The win rate will be used as a
performance measure to compare the final results.Reinforcement learning is increasingly becoming one of the most interesting
areas of research in recent years. It is a machine learning approach that aims to
design autonomous agents capable of learning from interaction with the envi-
ronment, similar to how a human does. This peculiarity makes it particularly
suitable for sequential decision making problems such as games. Indeed games
are a perfect testing ground for reinforcement learning agents, due to a con-
trolled environment, challenging tasks and a clear objective. Recent advances in
deep learning allowed reinforcement learning algorithms to exceed human level
performance in multiple games, the most notorious example being AlphaGo. In
this thesis work we will apply deep reinforcement learning methods to Briscola,
one of the most popular card games in Italy. After formalizing the two-player
Briscola as a RL problem, we will apply two algorithms: Deep Q-learning and
Proximal Policy Optimization. The agents will be trained against a random
agent and an agent with predefined moves. The win rate will be used as a
performance measure to compare the final results
ANALYZING HUMAN-INDUCED PATHOLOGY IN THE TRAINING OF REINFORCEMENT LEARNING ALGORITHMS
Modern artificial intelligence (AI) systems trained with reinforcement learning (RL) are increasingly more capable, but agents training to complete tasks in safety critical environments still require millions of trial-and-error training steps. Previous research with a Pong agent has shown that some human heuristics initially accelerate training but cause agent performance to regress to a state of performance collapse. This thesis utilizes the FlappyBird environment to evaluate if the pathology is generalizable. After initially confirming a similar pathology in an unaided agent, comprehensive experimentation was performed with optimizers, weight initialization methods, activation functions, and varied hyperparameters. The pathology persisted across all experiments and the results show the network architecture is likely the principal cause. At a high level, this work illustrates the importance of determining the inherent capacity of an architecture to learn and model complex environments and how more systematic methods to quantify capacity would greatly enhance RL.Outstanding ThesisCaptain, United States Marine CorpsApproved for public release. Distribution is unlimited
Influencing Exploration in Actor-Critic Reinforcement Learning Algorithms
Reinforcement Learning (RL) is a subset of machine learning primarily concerned with goal-directed learning and optimal decision making. RL agents learn based on a reward signal discovered from trial and error in complex, uncertain environments with the goal of maximizing positive reward signals. RL approaches need to scale up as they are applied to more complex environments with extremely large state spaces. Inefficient exploration methods cannot sufficiently explore complex environments in a reasonable amount of time, and optimal policies will be unrealized resulting in RL agents failing to solve an environment.
This thesis proposes a novel variant of the Actor-Advantage Critic (A2C) algorithm. The variant is validated against two state-of-the-art RL algorithms, Deep Q-Network (DQN) and A2C, across six Atari 2600 games of varying difficulty. The experimental results are competitive with state-of-the-art and achieve lower variance and quicker learning speed. Additionally, the thesis introduces a metric to objectively quantify the difficulty of any Markovian environment with respect to the exploratory capacity of RL agents
Recommended from our members
Learning from Sequential User Data: Models and Sample-efficient Algorithms
Recent advances in deep learning have made learning representation from ever-growing datasets possible in the domain of vision, natural language processing (NLP), and robotics, among others. However, deep networks are notoriously data-hungry; for example, training language models with attention mechanisms sometimes requires trillions of parameters and tokens. In contrast, we can often access a limited number of samples in many tasks. It is crucial to learn models from these `limited\u27 datasets. Learning with limited datasets can take several forms. In this thesis, we study how to select data samples sequentially such that downstream task performance is maximized. Moreover, we study how to introduce prior knowledge in the deep networks to maximize prediction performance. We focus on four sequential tasks: computerized adaptive testing in psychometrics, sketching in recommender systems, knowledge tracing in computer-assisted education, and career path modeling in the labor market.
In the first two tasks, we devise novel sample-efficient algorithms to query a minimal number of sequential samples to improve future predictions. We propose a Bilevel Optimization-Based framework for computerized adaptive testing to learn a data-driven question selection algorithm that improves existing data selection policies. We also tackle the sketching problem in the recommender system, with the task of recommending the next item using a stored subset of prior data samples. In this setting, we develop a data-driven sequential selection algorithm that tackles evolving downstream task distribution. In the last two tasks, we devise novel neural models to introduce prior knowledge exploiting limited data samples. For knowledge tracing, we propose a novel neural architecture, inspired by cognitive and psychometric models, to improve the prediction of students\u27 future performance and utilize the labeled data samples efficiently. For career path modeling, we propose a novel and interpretable monotonic nonlinear state-space model to analyze online user professional profiles and provide actionable feedback and recommendations to users on how they can reach their career goals.
The data-driven differentiable data selection algorithms for the first two tasks open up future directions to query (a non-differentiable operation) a minimal number of samples optimally to maximize prediction performance. The structures, introduced in the neural architecture for the models in the last two tasks using prior knowledge, open up future directions to learn deep models augmented with prior knowledge using limited data samples
Wisdom of the crowd, The: reliable deep reinforcement learning through ensembles of Q-functions
2018 Summer.Includes bibliographical references.Reinforcement learning agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is reinforcement learning can feel too slow and unstable during learning: exhibiting performance like that of a randomly initialized Q-function just a few parameter updates after solving the task. We explore the possibility that ensemble methods can remedy these shortcomings and do so by investigating a novel technique which harnesses the wisdom of the crowds by bagging Q-function approximator estimates. Our results show that this proposed approach improves all tasks and reinforcement learning approaches attempted. We are able to demonstrate that this is a direct result of the increased stability of the action portion of the state-action-value function used by Q-learning to select actions and by policy gradient methods to train the policy. Recently developed methods attempt to solve these RL challenges at the cost of increasing the number of interactions with the environment by several orders of magnitude. On the other hand, the proposed approach has little downside for inclusion: it addresses RL challenges while reducing the number interactions with the environment
MARBLER: An Open Platform for Standarized Evaluation of Multi-Robot Reinforcement Learning Algorithms
Multi-agent reinforcement learning (MARL) has enjoyed significant recent
progress, thanks to deep learning. This is naturally starting to benefit
multi-robot systems (MRS) in the form of multi-robot RL (MRRL). However,
existing infrastructure to train and evaluate policies predominantly focus on
challenges in coordinating virtual agents, and ignore characteristics important
to robotic systems. Few platforms support realistic robot dynamics, and fewer
still can evaluate Sim2Real performance of learned behavior. To address these
issues, we contribute MARBLER: Multi-Agent RL Benchmark and Learning
Environment for the Robotarium. MARBLER offers a robust and comprehensive
evaluation platform for MRRL by marrying Georgia Tech's Robotarium (which
enables rapid prototyping on physical MRS) and OpenAI's Gym framework (which
facilitates standardized use of modern learning algorithms). MARBLER offers a
highly controllable environment with realistic dynamics, including barrier
certificate-based obstacle avoidance. It allows anyone across the world to
train and deploy MRRL algorithms on a physical testbed with reproducibility.
Further, we introduce five novel scenarios inspired by common challenges in MRS
and provide support for new custom scenarios. Finally, we use MARBLER to
evaluate popular MARL algorithms and provide insights into their suitability
for MRRL. In summary, MARBLER can be a valuable tool to the MRS research
community by facilitating comprehensive and standardized evaluation of learning
algorithms on realistic simulations and physical hardware. Links to our
open-source framework and the videos of real-world experiments can be found at
https://shubhlohiya.github.io/MARBLER/.Comment: 7 pages, 3 figures, submitted to MRS 2023, for the associated
website, see https://shubhlohiya.github.io/MARBLER
Recommended from our members
End-to-end deep reinforcement learning in computer systems
Abstract
The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications.
In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping.
Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications
Modern applications of machine learning in quantum sciences
In these Lecture Notes, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning
- …