441 research outputs found

    Towards Cooperative MARL in Industrial Domains

    Get PDF

    The Dormant Neuron Phenomenon in Deep Reinforcement Learning

    Full text link
    In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.Comment: Oral at ICML 202

    Reinforcement Learning for Argumentation

    Get PDF
    Argumentation as a logical reasoning approach plays an important role in improving communication, increasing agree-ability, and resolving conflicts in multi-agent-systems (MAS). The present research aims to explore the effectiveness of argumentation in reinforcement learning of intelligent agents in terms of, outperforming baseline agents, learning transfer between argument graphs, and improving relevance and coherence of dialogue quality. This research developed `ARGUMENTO+' to encourage a reinforcement learning agent (RL agent) playing abstract argument game for improving performance against different baseline agents by using a newly proposed state representation in order to make each state unique. When attempting to generalise this approach to other argumentation graphs, the RL agent was not able to effectively identify the argument patterns that are transferable to other domains. In order to improve the effectiveness of the RL agent to recognise argument patterns, this research adopted a logic-based dialogue game approach with richer argument representations. In the DE dialogue game, the RL agent played against hard-coded heuristic agents and showed improved performance compared to the baseline agents by using a reward function that encourages the RL agent to win the game with minimum number of moves. This also allowed the RL agent to adopt its own strategy, make moves, and learn to argue. This thesis also presents a new reward function that makes the RL agent's dialogue more coherent and relevant than its opponents. The RL agent was designed to recognise argument patterns, i.e. argumentation schemes and evidence support sources, which can be related to different domains. The RL agent used a transfer learning method to generalise and transfer experiences and speed up learning

    A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

    Get PDF
    The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We rely on a unifying formalism and terminology for discussing different ZSG problems, building upon previous works. We go on to categorise existing benchmarks for ZSG, as well as current methods for tackling these problems. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG, and we recommend building benchmarks in underexplored problem settings such as offline RL ZSG and reward-function variation

    BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

    Full text link
    We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments

    The Role of Diverse Replay for Generalisation in Reinforcement Learning

    Full text link
    In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the impact of these components in the context of generalisation in multi-task RL. We investigate the hypothesis that collecting and training on more diverse data from the training environment will improve zero-shot generalisation to new environments/tasks. We motivate mathematically and show empirically that generalisation to states that are "reachable" during training is improved by increasing the diversity of transitions in the replay buffer. Furthermore, we show empirically that this same strategy also shows improvement for generalisation to similar but "unreachable" states and could be due to improved generalisation of latent representations.Comment: 14 pages, 8 figure

    Forecasting bitcoin's volatility: Exploring the potential of deep-learning

    Get PDF
    The importance of using the right statistical, mathematical and computational tools can highly influence the decision-making process. With the recent computational progress, Deep Learning methodologies based on Artificial Intelligence seem to be pointed out as a promising tool to study financial time series, characterised by out-of-the-ordinary patterns. Cryptocurrencies are a new asset class with several specially interesting characteristics that still lack deep study and differ from the traditional time series. Bitcoin in particular is characterised by extraordinary high volatility, high number of structural breaks and other identified characteristics that might further difficult the study and forecasting of the time series using classical models. The goal of this study is to critically compare the forecasting properties of classic methodologies (ARCH and GARCH) with Deep Learning Techniques (with MLP, RNN and LSTM architectures) when forecasting Bitcoin’s Volatility. The empirical study focuses on the forecasting of Bitcoin’s Volatility using such models and comparing its forecasting quality using MAE and MAPE for one, three- and seven-day’s forecasting horizons. The Deep learning methodologies show advantages in terms of forecasting quality (when we take in consideration the MAPE) but also require huge computational costs. Diebold-Mariano tests were also performed to compare the forecasts concluding the superiority of Deep Learning Methodologies.A importância de usar as ferramentas estatísticas, matemáticas e computacionais certas pode certamente influenciar o processo de decisão. Com os recentes avanços computacionais, as metodologias Deep-Learning, baseadas em Inteligência Artificial apontam para uma ferramenta promissora para o estudo de séries temporais de dados financeiros, caracterizadas por padrões que são fora do normal. As criptomoedas são uma nova classe de ativos que são caracterizados por alta volatilidade, elevado número de quebras de estrutura e outras características que podem dificultar o estudo e previsão por parte de modelos clássicos. O objetivo deste trabalho é analisar de forma crítica as capacidades de previsão das metodologias clássicas (ARCH e GARCH) comparativamente a metodologias de Deep-Learning (nomeadamente arquiteturas de redes neuronais: MLP, RNN e LSTM) para a previsão da volatilidade da bitcoin. O estudo empírico deste trabalho foca-se na previsão da volatilidade da bitcoin com os modelos supramencionados e comparar a sua qualidade preditiva usando as medidas de erro MAE e MAPE para horizontes de previsão de um, três e sete dias. As metodologias de Deep-Learning apresentam algumas vantagens no que respeita à qualidade de previsão (pela análise da métrica de erro MAPE) mas apresentam um custo computacional superior. Também foram realizados Testes de Diebold-Mariano para comparar as previsões, concluindo-se a superioridade das metodologias de Deep-Learning
    • …
    corecore