6 research outputs found

    Deep Learning and Reward Design for Reinforcement Learning

    Full text link
    One of the fundamental problems in Artificial Intelligence is sequential decision making in a flexible environment. Reinforcement Learning (RL) gives a set of tools for solving sequential decision problems. Although the theory of RL addresses a general class of learning problems with a constructive mathematical formulation, the challenges posed by the interaction of rich perception and delayed rewards in many domains remain a significant barrier to the widespread applicability of RL methods. The rich perception problem itself has two components: 1) the sensors at any time step do not capture all the information in the history of observations, leading to partial observability, and 2) the sensors provide very high-dimensional observations, such as images and natural languages, that introduce computational and sample-complexity challenges for the representation and generalization problems in policy selection. The delayed reward problem—that the effect of actions in terms of future rewards is delayed in time—makes it hard to determine how to credit action sequences for reward outcomes. This dissertation offers a set of contributions that adapt the hierarchical representation learning power of deep learning to address rich perception in vision and text domains, and develop new reward design algorithms to address delayed rewards. The first contribution is a new learning method for deep neural networks in vision-based real-time control. The learning method distills slow policies of the Monte Carlo Tree Search (MCTS) into fast convolutional neural networks, which outperforms the conventional Deep Q-Network. The second contribution is a new end-to-end reward design algorithm to mitigate the delayed rewards for the state-of-the-art MCTS method. The reward design algorithm converts visual perceptions into reward bonuses via deep neural networks, and optimizes the network weights to improve the performance of MCTS end-to-end via policy gradient. The third contribution is to extend existing policy gradient reward design method from single task to multiple tasks. Reward bonuses learned from old tasks are transferred to new tasks to facilitate learning. The final contribution is an application of deep reinforcement learning to another type of rich perception, ambiguous texts. A synthetic data set is proposed to evaluate the querying, reasoning and question-answering abilities of RL agents, and a deep memory network architecture is applied to solve these challenging problems to substantial degrees.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/136931/1/guoxiao_1.pd

    Enhancing natural language understanding using meaning representation and deep learning

    Get PDF
    Natural Language Understanding (NLU) is one of the complex tasks in artificial intelligence. Machine learning was introduced to address the complex and dynamic nature of natural language. Deep learning gained popularity within the NLU community due to its capability of learning features directly from data, as well as learning from the dynamic nature of natural language. Furthermore, deep learning has shown to be able to learn the hidden feature(s) automatically and outperform most of the other machine learning approaches for NLU. Deep learning models require natural language inputs to be converted to vectors (word embedding). Word2Vec and GloVe are word embeddings which are designed to capture the analogy context-based statistics and provide lexical relations on words. Using the context-based statistical approach does not capture the prior knowledge required to understand language combined with words. Although a deep learning model receives word embedding, language understanding requires Reasoning, Attention and Memory (RAM). RAM are key factors in understanding language. Current deep learning models focus either on reasoning, attention or memory. In order to properly understand a language however, all three factors of RAM should be considered. Also, a language normally has a long sequence. This long sequence creates dependencies which are required in order to understand a language. However, current deep learning models, which are developed to hold longer sequences, either forget or get affected by the vanishing or exploding gradient descent. In this thesis, these three main areas are of focus. A word embedding technique, which integrates analogy context-based statistical and semantic relationships, as well as extracts from a knowledge base to hold enhanced meaning representation, is introduced. Also, a Long Short-Term Reinforced Memory (LSTRM) network is introduced. This addresses RAM and is validated by testing on question answering data sets which require RAM. Finally, a Long Term Memory Network (LTM) is introduced to address language modelling. Good language modelling requires learning from long sequences. Therefore, this thesis demonstrates that integrating semantic knowledge and a knowledge base generates enhanced meaning and deep learning models that are capable of achieving RAM and long-term dependencies so as to improve the capability of NLU
    corecore