Search CORE

3 research outputs found

Q-learning with online trees

Author: Min Joosung (Sonny)
Publication venue
Publication date: 13/08/2021
Field of study

Reinforcement learning is one of the major areas of artificial intelligence that has been studied rigorously in recent years. Among numerous methodologies, Q-learning is one of the most fundamental model-free reinforcement learning algorithms, and it has inspired many researchers. Several studies have shown great results by approximating the action-value function, one of the essential elements in Q-learning, using non-linear supervised learning models such as deep neural networks. This combination has led to the surpassing humanlevel performances in complex problems such as the Atari games and Go, which have been difficult to solve with standard tabular Q-learning. However, both Q-learning and the deep neural network typically used as the function approximator require very large computational resources to train. We propose using the online random forest method as the function approximator for the action-value function to mitigate this. We grow one online random forest for each possible action in a Markov decision process (MDP) environment. Each forest approximates the corresponding action-value function for that action, and the agent chooses the action in the succeeding state according to the resulting approximated action-value functions. When the agent executes an action, an observation consisting of the state, action, reward, and the subsequent state is stored in an experience replay. Then, the observations are randomly sampled to participate in the growth of the online random forests. The terminal nodes of the trees in the random forests corresponding to each sample randomly generate tests for the decision tree splits. Among them, the test that gives the lowest residual sum of squares after splitting is selected. The trees of the online random forests grown in this way age each time they take in a sample observation. One of the trees that is older than a certain age is then selected at random and replaced by a new tree according to its out-of-bag error. In our study, forest size plays an important role. Our algorithm constitutes an adaptation of previously developed Online Random Forests to reinforcement learning. To reduce computational costs, we first grow a small-sized forest and then expand them after a certain period of episodes. We observed in our experiments that this forest size expansion showed better performances in later episodes. Furthermore, we found that our method outperformed some deep neural networks in simple MDP environments. We hope that this study will be a medium to promote research on the combination of reinforcement learning and tree-based methods

Simon Fraser University Institutional Repository

Tree-Based On-Line Reinforcement Learning

Author: Barreto Andre
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 21/06/2014
Field of study

Fitted Q-iteration (FQI) stands out among reinforcement learning algorithms for its flexibility and ease of use. FQI can be combined with any regression method, and this choice determines the algorithm's statistical and computational properties. The combination of FQI with an ensemble of regression trees gives rise to an algorithm, FQIT, that is computationally efficient, scalable to high dimensional spaces, and robust to noise. Despite its nice properties and good performance in practice, FQIT also has some limitations: the fact that an ensemble of trees must be constructed (or updated) at each iteration confines the algorithm to the batch scenario. This paper aims to address this specific issue. Based on a strategy recently proposed in the literature, called the stochastic-factorization trick, we propose a modification of FQIT that makes it fully incremental, and thus suitable for on-line learning. We call the resulting method tree-based stochastic factorization (TBSF). We derive upper bounds for the difference between the value functions computed by FQIT and TBSF, and also show in which circumstances the approximations coincide. A series of computational experiments is presented to illustrate the properties of TBSF and to show its usefulness in practice, including a medical problem involving the treatment of patients infected with HIV

Association for the Advancement of Artificial Intelligence: AAAI Publications