120 research outputs found

    Elastic Monte Carlo Tree Search

    Get PDF

    Learning Path Constraints for UAV Autonomous Navigation Under Uncertain GNSS Availability

    Get PDF
    This paper addresses a safe path planning problem for UAV urban navigation, under uncertain GNSS availability. The problem can be modeled as a POMDP and solved with sampling-based algorithms. However, such a complex domain suffers from high computational cost and achieves poor results under real-time constraints. Recent research seeks to integrate offline learning in order to efficiently guide online planning. Inspired by the state-of-the-art CAMP (Context-specific Abstract Markov decision Process) formalization, this paper proposes an offline process which learns the path constraint to impose during online POMDP solving in order to reduce the policy search space. More precisely, the offline learnt constraint selector returns the best path constraint according to the GNSS availability probability in the environment. Conclusions of experiments, carried out for three environments, show that using the proposed approach allows to improve the quality of a solution reached by an online planner, within a fixed decision-making timeframe, particularly when GNSS availability probability is low

    Learning Path Constraints for UAV Autonomous Navigation Under Uncertain GNSS Availability

    Get PDF
    This paper addresses a safe path planning problem for UAV urban navigation, under uncertain GNSS availability. The problem can be modeled as a POMDP and solved with sampling-based algorithms. However, such a complex domain suffers from high computational cost and achieves poor results under real-time constraints. Recent research seeks to integrate offline learning in order to efficiently guide online planning. Inspired by the state-of-the-art CAMP (Context-specific Abstract Markov decision Process) formalization, this paper proposes an offline process which learns the path constraint to impose during online POMDP solving in order to reduce the policy search space. More precisely, the offline learnt constraint selector returns the best path constraint according to the GNSS availability probability in the environment. Conclusions of experiments, carried out for three environments, show that using the proposed approach allows to improve the quality of a solution reached by an online planner, within a fixed decision-making timeframe, particularly when GNSS availability probability is low

    Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction

    Full text link
    Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have achieved superhuman performance in many challenging tasks. However, the computational complexity of MCTS-based algorithms is influenced by the size of the search space. To address this issue, we propose a novel probability tree state abstraction (PTSA) algorithm to improve the search efficiency of MCTS. A general tree state abstraction with path transitivity is defined. In addition, the probability tree state abstraction is proposed for fewer mistakes during the aggregation step. Furthermore, the theoretical guarantees of the transitivity and aggregation error bound are justified. To evaluate the effectiveness of the PTSA algorithm, we integrate it with state-of-the-art MCTS-based algorithms, such as Sampled MuZero and Gumbel MuZero. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10%-45% search space reduction

    A Theory of Model Selection in Reinforcement Learning

    Full text link
    Reinforcement Learning (RL) is a machine learning paradigm where an agent learns to accomplish sequential decision-making tasks from experience. Applications of RL are found in robotics and control, dialog systems, medical treatment, etc. Despite the generality of the framework, most empirical successes of RL to-date are restricted to simulated environments, where hyperparameters are tuned by trial and error using large amounts of data. In contrast, collecting data with active intervention in the real world can be costly, time-consuming, and sometimes unsafe. Choosing the hyperparameters and understanding their effects in face of these data limitations, i.e., model selection, is an important yet open direction that we need to study to enable such applications of RL, which is the main theme of this thesis. More concretely, this thesis presents theoretical results that improve our understanding of 3 hyperparameters in RL: planning horizon, state representation (abstraction), and reward function. The 1st part of the thesis focuses on the interplay between planning horizon and limited amount of data, and establishes a formal explanation for how a long planning horizon can cause overfitting. The 2nd part considers the problem of choosing the right state abstraction using limited batch data; I show that cross-validation type methods require importance sampling and suffer from exponential variance, and a novel regularization-based algorithm enjoys an oracle-like property. The 3rd part investigates reward misspecification and tries to resolve it by leveraging expert demonstrations, which is inspired by AI safety concerns and bears close connections to inverse reinforcement learning. A recurring theme of the thesis is the deployment of formulations and techniques from other machine learning theory (mostly statistical learning theory): the planning horizon work explains the overfitting phenomenon by making a formal analogy to empirical risk minimization and by proving planning loss bounds that are similar to generalization error bounds; the main result in the abstraction selection work takes the form of an oracle inequality, which is a concept from structural risk minimization for model selection in supervised learning; the inverse RL work provides a mistake-bound type analysis under arbitrarily chosen environments, which can be viewed as a form of no-regret learning. Overall, by borrowing ideas from mature theories of machine learning, we can develop analogies for RL that allow us to better understand the impact of hyperparameters, and develop algorithms that automatically set them in an effective manner.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138518/1/nanjiang_1.pd

    Planning in hybrid relational MDPs

    Get PDF
    We study planning in relational Markov decision processes involving discrete and continuous states and actions, and an unknown number of objects. This combination of hybrid relational domains has so far not received a lot of attention. While both relational and hybrid approaches have been studied separately, planning in such domains is still challenging and often requires restrictive assumptions and approximations. We propose HYPE: a sample-based planner for hybrid relational domains that combines model-based approaches with state abstraction. HYPE samples episodes and uses the previous episodes as well as the model to approximate the Q-function. In addition, abstraction is performed for each sampled episode, this removes the complexity of symbolic approaches for hybrid relational domains. In our empirical evaluations, we show that HYPE is a general and widely applicable planner in domains ranging from strictly discrete to strictly continuous to hybrid ones, handles intricacies such as unknown objects and relational models. Moreover, empirical results showed that abstraction provides significant improvements.status: publishe

    VI Workshop on Computational Data Analysis and Numerical Methods: Book of Abstracts

    Get PDF
    The VI Workshop on Computational Data Analysis and Numerical Methods (WCDANM) is going to be held on June 27-29, 2019, in the Department of Mathematics of the University of Beira Interior (UBI), Covilhã, Portugal and it is a unique opportunity to disseminate scientific research related to the areas of Mathematics in general, with particular relevance to the areas of Computational Data Analysis and Numerical Methods in theoretical and/or practical field, using new techniques, giving especial emphasis to applications in Medicine, Biology, Biotechnology, Engineering, Industry, Environmental Sciences, Finance, Insurance, Management and Administration. The meeting will provide a forum for discussion and debate of ideas with interest to the scientific community in general. With this meeting new scientific collaborations among colleagues, namely new collaborations in Masters and PhD projects are expected. The event is open to the entire scientific community (with or without communication/poster)

    Deep Learning and Reward Design for Reinforcement Learning

    Full text link
    One of the fundamental problems in Artificial Intelligence is sequential decision making in a flexible environment. Reinforcement Learning (RL) gives a set of tools for solving sequential decision problems. Although the theory of RL addresses a general class of learning problems with a constructive mathematical formulation, the challenges posed by the interaction of rich perception and delayed rewards in many domains remain a significant barrier to the widespread applicability of RL methods. The rich perception problem itself has two components: 1) the sensors at any time step do not capture all the information in the history of observations, leading to partial observability, and 2) the sensors provide very high-dimensional observations, such as images and natural languages, that introduce computational and sample-complexity challenges for the representation and generalization problems in policy selection. The delayed reward problem—that the effect of actions in terms of future rewards is delayed in time—makes it hard to determine how to credit action sequences for reward outcomes. This dissertation offers a set of contributions that adapt the hierarchical representation learning power of deep learning to address rich perception in vision and text domains, and develop new reward design algorithms to address delayed rewards. The first contribution is a new learning method for deep neural networks in vision-based real-time control. The learning method distills slow policies of the Monte Carlo Tree Search (MCTS) into fast convolutional neural networks, which outperforms the conventional Deep Q-Network. The second contribution is a new end-to-end reward design algorithm to mitigate the delayed rewards for the state-of-the-art MCTS method. The reward design algorithm converts visual perceptions into reward bonuses via deep neural networks, and optimizes the network weights to improve the performance of MCTS end-to-end via policy gradient. The third contribution is to extend existing policy gradient reward design method from single task to multiple tasks. Reward bonuses learned from old tasks are transferred to new tasks to facilitate learning. The final contribution is an application of deep reinforcement learning to another type of rich perception, ambiguous texts. A synthetic data set is proposed to evaluate the querying, reasoning and question-answering abilities of RL agents, and a deep memory network architecture is applied to solve these challenging problems to substantial degrees.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/136931/1/guoxiao_1.pd

    Simple Partial Models for Complex Dynamical Systems.

    Full text link
    An agent in an unknown environment may wish to learn a model that allows it to make predictions about future events and anticipate the consequences of its actions. Such a model can greatly enhance the agent's ability to make good decisions. However, in environments like the one in which we live, which is stochastic, partially observable, and high dimensional, learning a model is a challenge. One approach when faced with a difficult model learning problem is not to model the entire system. Instead, one might focus on the most important aspects of the environment and give up on modeling complicated, irrelevant phenomena. This intuition can be formalized using partial models, which are models that make only a restricted set of predictions in only a restricted set of circumstances. Because a partial model has limited prediction responsibilities, it may be significantly simpler than a complete model. Partial models have been studied in many contexts, mostly under the Markov assumption, where the agent is assumed to have access to the full state of the world. In this setting, predictions can be learned directly as functions of state and the process of learning a partial model is often as simple as estimating only the desired predictions and omitting the rest from the model. As such, much of the relevant work has focused on the challenging question of which partial models should be learned (rather than how to learn them). In the partially observable case, however, where state is assumed to be hidden from the agent, the basic problem of how to learn a partial model poses significant challenges. The goal of this thesis is to provide general results and methods for learning partial models in partially observable systems. The main challenges posed by partial observability are formalized and learning methods are developed to address these issues. The methods presented are demonstrated empirically to learn partial models in systems that are too complex for standard, complete model learning methods. Finally, many partial models are learned and composed to form complete models that are used for model-based planning in high dimensional arcade game examples.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/78893/1/etalviti_1.pd
    • …
    corecore