4,190 research outputs found

    Distributional Reinforcement Learning for Efficient Exploration

    Full text link
    In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN

    Variational Inference with Tail-adaptive f-Divergence

    Full text link
    Variational inference with {\alpha}-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using {\alpha}-divergences (with positive {\alpha} values) is their mass-covering property. However, estimating and optimizing {\alpha}-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantees finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm. Our results show that our approach yields significant advantages compared with existing methods based on classical KL and {\alpha}-divergences.Comment: NeurIPS 201

    Efficient exploration with Double Uncertain Value Networks

    Full text link
    This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.Comment: Deep Reinforcement Learning Symposium @ Conference on Neural Information Processing Systems (NIPS) 201

    Sampling-based Incremental Information Gathering with Applications to Robotic Exploration and Environmental Monitoring

    Full text link
    In this article, we propose a sampling-based motion planning algorithm equipped with an information-theoretic convergence criterion for incremental informative motion planning. The proposed approach allows dense map representations and incorporates the full state uncertainty into the planning process. The problem is formulated as a constrained maximization problem. Our approach is built on rapidly-exploring information gathering algorithms and benefits from advantages of sampling-based optimal motion planning algorithms. We propose two information functions and their variants for fast and online computations. We prove an information-theoretic convergence for an entire exploration and information gathering mission based on the least upper bound of the average map entropy. A natural automatic stopping criterion for information-driven motion control results from the convergence analysis. We demonstrate the performance of the proposed algorithms using three scenarios: comparison of the proposed information functions and sensor configuration selection, robotic exploration in unknown environments, and a wireless signal strength monitoring task in a lake from a publicly available dataset collected using an autonomous surface vehicle.Comment: Revision submitted to IJRR, 49 page

    Robustness and macroeconomic policy

    Get PDF
    This paper considers the design of macroeconomic policies in the face of uncertainty. In recent years, several economists have advocated that when policymakers are uncertain about the environment they face and find it difficult to assign precise probabilities to the alternative scenarios that may characterize this environment, they should design policies to be robust in the sense that they minimize the worstcase loss these policies could ever impose. I review and evaluate the objections cited by critics of this approach. I further argue that, contrary to what some have inferred, concern about worst-case scenarios does not always lead to policies that respond more aggressively to incoming news than the optimal policy would respond absent any uncertainty.Macroeconomics - Econometric models

    Reinforcement Learning under Model Mismatch

    Full text link
    We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the model-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define robust versions of Q-learning, SARSA, and TD-learning and prove convergence to an approximately optimal robust policy and approximate value function respectively. We scale up the robust algorithms to large MDPs via function approximation and prove convergence under two different settings. We prove convergence of robust approximate policy iteration and robust approximate value iteration for linear architectures (under mild assumptions). We also define a robust loss function, the mean squared robust projected Bellman error and give stochastic gradient descent algorithms that are guaranteed to converge to a local minimum.Comment: To appear in Proceedings of NIPS 201

    Robust Analysis in Stochastic Simulation: Computation and Performance Guarantees

    Full text link
    Any performance analysis based on stochastic simulation is subject to the errors inherent in misspecifying the modeling assumptions, particularly the input distributions. In situations with little support from data, we investigate the use of worst-case analysis to analyze these errors, by representing the partial, nonparametric knowledge of the input models via optimization constraints. We study the performance and robustness guarantees of this approach. We design and analyze a numerical scheme for solving a general class of simulation objectives and uncertainty specifications. The key steps involve a randomized discretization of the probability spaces, a simulable unbiased gradient estimator using a nonparametric analog of the likelihood ratio method, and a Frank-Wolfe (FW) variant of the stochastic approximation (SA) method (which we call FWSA) run on the space of input probability distributions. A convergence analysis for FWSA on non-convex problems is provided. We test the performance of our approach via several numerical examples

    Risk Sensitive Rendezvous Algorithm for Heterogeneous Agents in Urban Environments

    Full text link
    Demand for fast and inexpensive parcel deliveries in urban environments has risen considerably in recent years. A framework is envisioned to enforce efficient last mile delivery in urban environments by leveraging a network of ride-sharing vehicles, where Unmanned Aerial Systems (UASs) drop packages on said vehicles which then cover the majority of the distance to finally be picked up by another UAS for delivery. This approach presents many engineering challenges, including the safe rendezvous of both agents: the UAS and the human-operated ground vehicle. In this paper, we introduce a framework to minimize the risk of failure, while allowing for optimal usage of the controlled agent. We formulate a compact fast planner to drive a UAS to a passive ground vehicle with inexact behavior, while providing intuitive and meaningful procedures to guarantee safety with minimal sacrifice of optimality. The resulting algorithm is shown to be fast and implementable in real-time via numerical tests.Comment: Full version of the same-titled paper accepted to ACC 202
    corecore