441 research outputs found
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
In this work we identify the dormant neuron phenomenon in deep reinforcement
learning, where an agent's network suffers from an increasing number of
inactive neurons, thereby affecting network expressivity. We demonstrate the
presence of this phenomenon across a variety of algorithms and environments,
and highlight its effect on learning. To address this issue, we propose a
simple and effective method (ReDo) that Recycles Dormant neurons throughout
training. Our experiments demonstrate that ReDo maintains the expressive power
of networks by reducing the number of dormant neurons and results in improved
performance.Comment: Oral at ICML 202
Reinforcement Learning for Argumentation
Argumentation as a logical reasoning approach plays an important role in improving communication, increasing agree-ability, and resolving conflicts in multi-agent-systems (MAS). The present research aims to explore the effectiveness of argumentation in reinforcement learning of intelligent agents in terms of, outperforming baseline agents, learning transfer between argument graphs, and improving relevance and coherence of dialogue quality.
This research developed `ARGUMENTO+' to encourage a reinforcement learning agent (RL agent) playing abstract argument game for improving performance against different baseline agents by using a newly proposed state representation in order to make each state unique. When attempting to generalise this approach to other argumentation graphs, the RL agent was not able to effectively identify the argument patterns that are transferable to other domains.
In order to improve the effectiveness of the RL agent to recognise argument patterns, this research adopted a logic-based dialogue game approach with richer argument representations. In the DE dialogue game, the RL agent played against hard-coded heuristic agents and showed improved performance compared to the baseline agents by using a reward function that encourages the RL agent to win the game with minimum number of moves. This also allowed the RL agent to adopt its own strategy, make moves, and learn to argue.
This thesis also presents a new reward function that makes the RL agent's dialogue more coherent and relevant than its opponents. The RL agent was designed to recognise argument patterns, i.e. argumentation schemes and evidence support sources, which can be related to different domains. The RL agent used a transfer learning method to generalise and transfer experiences and speed up learning
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning
(RL) aims to produce RL algorithms whose policies generalise well to novel
unseen situations at deployment time, avoiding overfitting to their training
environments. Tackling this is vital if we are to deploy reinforcement learning
algorithms in real world scenarios, where the environment will be diverse,
dynamic and unpredictable. This survey is an overview of this nascent field. We
rely on a unifying formalism and terminology for discussing different ZSG
problems, building upon previous works. We go on to categorise existing
benchmarks for ZSG, as well as current methods for tackling these problems.
Finally, we provide a critical discussion of the current state of the field,
including recommendations for future work. Among other conclusions, we argue
that taking a purely procedural content generation approach to benchmark design
is not conducive to progress in ZSG, we suggest fast online adaptation and
tackling RL-specific problems as some areas for future work on methods for ZSG,
and we recommend building benchmarks in underexplored problem settings such as
offline RL ZSG and reward-function variation
BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits
We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound
(BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary
environments. This unique combination of Bayesian and frequentist principles
enhances adaptability and performance in dynamic settings. The BOF-UCB
algorithm utilizes sequential Bayesian updates to infer the posterior
distribution of the unknown regression parameter, and subsequently employs a
frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing
the expected reward over the posterior distribution. We provide theoretical
guarantees of BOF-UCB's performance and demonstrate its effectiveness in
balancing exploration and exploitation on synthetic datasets and classical
control tasks in a reinforcement learning setting. Our results show that
BOF-UCB outperforms existing methods, making it a promising solution for
sequential decision-making in non-stationary environments
The Role of Diverse Replay for Generalisation in Reinforcement Learning
In reinforcement learning (RL), key components of many algorithms are the
exploration strategy and replay buffer. These strategies regulate what
environment data is collected and trained on and have been extensively studied
in the RL literature. In this paper, we investigate the impact of these
components in the context of generalisation in multi-task RL. We investigate
the hypothesis that collecting and training on more diverse data from the
training environment will improve zero-shot generalisation to new
environments/tasks. We motivate mathematically and show empirically that
generalisation to states that are "reachable" during training is improved by
increasing the diversity of transitions in the replay buffer. Furthermore, we
show empirically that this same strategy also shows improvement for
generalisation to similar but "unreachable" states and could be due to improved
generalisation of latent representations.Comment: 14 pages, 8 figure
Forecasting bitcoin's volatility: Exploring the potential of deep-learning
The importance of using the right statistical, mathematical and computational tools can highly influence the decision-making process. With the recent computational progress, Deep Learning methodologies based on Artificial Intelligence seem to be pointed out as a promising tool to study financial time series, characterised by out-of-the-ordinary patterns. Cryptocurrencies are a new asset class with several specially interesting characteristics that still lack deep study and differ from the traditional time series. Bitcoin in particular is characterised by extraordinary high volatility, high number of structural breaks and other identified characteristics that might further difficult the study and forecasting of the time series using classical models.
The goal of this study is to critically compare the forecasting properties of classic methodologies (ARCH and GARCH) with Deep Learning Techniques (with MLP, RNN and LSTM architectures) when forecasting Bitcoin’s Volatility. The empirical study focuses on the forecasting of Bitcoin’s Volatility using such models and comparing its forecasting quality using MAE and MAPE for one, three- and seven-day’s forecasting horizons.
The Deep learning methodologies show advantages in terms of forecasting quality (when we take in consideration the MAPE) but also require huge computational costs. Diebold-Mariano tests were also performed to compare the forecasts concluding the superiority of Deep Learning Methodologies.A importância de usar as ferramentas estatÃsticas, matemáticas e computacionais certas pode certamente influenciar o processo de decisão. Com os recentes avanços computacionais, as metodologias Deep-Learning, baseadas em Inteligência Artificial apontam para uma ferramenta promissora para o estudo de séries temporais de dados financeiros, caracterizadas por padrões que são fora do normal. As criptomoedas são uma nova classe de ativos que são caracterizados por alta volatilidade, elevado número de quebras de estrutura e outras caracterÃsticas que podem dificultar o estudo e previsão por parte de modelos clássicos.
O objetivo deste trabalho é analisar de forma crÃtica as capacidades de previsão das metodologias clássicas (ARCH e GARCH) comparativamente a metodologias de Deep-Learning (nomeadamente arquiteturas de redes neuronais: MLP, RNN e LSTM) para a previsão da volatilidade da bitcoin. O estudo empÃrico deste trabalho foca-se na previsão da volatilidade da bitcoin com os modelos supramencionados e comparar a sua qualidade preditiva usando as medidas de erro MAE e MAPE para horizontes de previsão de um, três e sete dias.
As metodologias de Deep-Learning apresentam algumas vantagens no que respeita à qualidade de previsão (pela análise da métrica de erro MAPE) mas apresentam um custo computacional superior. Também foram realizados Testes de Diebold-Mariano para comparar as previsões, concluindo-se a superioridade das metodologias de Deep-Learning
- …