Search CORE

4,190 research outputs found

Distributional Reinforcement Learning for Efficient Exploration

Author: Kong Linglong
Mavrin Borislav
Wu Kaiwen
Yao Hengshuai
Yu Yaoliang
Zhang Shangtong
Publication venue
Publication date: 13/05/2019
Field of study

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN

arXiv.org e-Print Archive

Variational Inference with Tail-adaptive f-Divergence

Author: Liu Hao
Liu Qiang
Wang Dilin
Publication venue
Publication date: 09/09/2019
Field of study

Variational inference with {\alpha}-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using {\alpha}-divergences (with positive {\alpha} values) is their mass-covering property. However, estimating and optimizing {\alpha}-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantees finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm. Our results show that our approach yields significant advantages compared with existing methods based on classical KL and {\alpha}-divergences.Comment: NeurIPS 201

arXiv.org e-Print Archive

Efficient exploration with Double Uncertain Value Networks

Author: Broekens Joost
Jonker Catholijn M.
Moerland Thomas M.
Publication venue
Publication date: 29/11/2017
Field of study

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.Comment: Deep Reinforcement Learning Symposium @ Conference on Neural Information Processing Systems (NIPS) 201

arXiv.org e-Print Archive

Recommended from our members

Longevity Risk and Capital Markets: The 2016-17 Update

Author: Blake D.
MacMinn R.
Publication venue: Pensions Institute
Publication date: 01/01/2015
Field of study

City Research Online

Sampling-based Incremental Information Gathering with Applications to Robotic Exploration and Environmental Monitoring

Author: Dissanayake Gamini
Jadidi Maani Ghaffari
Miro Jaime Valls
Publication venue: 'SAGE Publications'
Publication date: 23/09/2017
Field of study

In this article, we propose a sampling-based motion planning algorithm equipped with an information-theoretic convergence criterion for incremental informative motion planning. The proposed approach allows dense map representations and incorporates the full state uncertainty into the planning process. The problem is formulated as a constrained maximization problem. Our approach is built on rapidly-exploring information gathering algorithms and benefits from advantages of sampling-based optimal motion planning algorithms. We propose two information functions and their variants for fast and online computations. We prove an information-theoretic convergence for an entire exploration and information gathering mission based on the least upper bound of the average map entropy. A natural automatic stopping criterion for information-driven motion control results from the convergence analysis. We demonstrate the performance of the proposed algorithms using three scenarios: comparison of the proposed information functions and sensor configuration selection, robotic exploration in unknown environments, and a wireless signal strength monitoring task in a lake from a publicly available dataset collected using an autonomous surface vehicle.Comment: Revision submitted to IJRR, 49 page

arXiv.org e-Print Archive

Robustness and macroeconomic policy

Author: Gadi Barlevy
Publication venue
Publication date
Field of study

This paper considers the design of macroeconomic policies in the face of uncertainty. In recent years, several economists have advocated that when policymakers are uncertain about the environment they face and find it difficult to assign precise probabilities to the alternative scenarios that may characterize this environment, they should design policies to be robust in the sense that they minimize the worstcase loss these policies could ever impose. I review and evaluate the objections cited by critics of this approach. I further argue that, contrary to what some have inferred, concern about worst-case scenarios does not always lead to policies that respond more aggressively to incoming news than the optimal policy would respond absent any uncertainty.Macroeconomics - Econometric models

Research Papers in Economics

Recommended from our members

A unified framework for resource-bounded autonomous agents interacting with unknown environments

Author: Ortega Pedro Alejandro Jr
Publication venue: University of Cambridge
Publication date: 12/07/2011
Field of study

The aim of this thesis is to present a mathematical framework for conceptualizing and constructing adaptive autonomous systems under resource constraints. The first part of this thesis contains a concise presentation of the foundations of classical agency: namely the formalizations of decision making and learning. Decision making includes: (a) subjective expected utility (SEU) theory, the framework of decision making under uncertainty; (b) the maximum SEU principle to choose the optimal solution; and (c) its application to the design of autonomous systems, culminating in the Bellman optimality equations. Learning includes: (a) Bayesian probability theory, the theory for reasoning under uncertainty that extends logic; and (b) Bayes-Optimal agents, the application of Bayesian probability theory to the design of optimal adaptive agents. Then, two major problems of the maximum SEU principle are highlighted: (a) the prohibitive computational costs and (b) the need for the causal precedence of the choice of the policy. The second part of this thesis tackles the two aforementioned problems. First, an information-theoretic notion of resources in autonomous systems is established. Second, a framework for resource-bounded agency is introduced. This includes: (a) a maximum bounded SEU principle that is derived from a set of axioms of utility; (b) an axiomatic model of probabilistic causality, which is applied for the formalization of autonomous systems having uncertainty over their policy and environment; and (c) the Bayesian control rule, which is derived from the maximum bounded SEU principle and the model of causality, implementing a stochastic adaptive control law that deals with the case where autonomous agents are uncertain about their policy and environment

Apollo (Cambridge)

Reinforcement Learning under Model Mismatch

Author: Pokutta Sebastian
Roy Aurko
Xu Huan
Publication venue
Publication date: 08/11/2017
Field of study

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the model-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define robust versions of Q-learning, SARSA, and TD-learning and prove convergence to an approximately optimal robust policy and approximate value function respectively. We scale up the robust algorithms to large MDPs via function approximation and prove convergence under two different settings. We prove convergence of robust approximate policy iteration and robust approximate value iteration for linear architectures (under mild assumptions). We also define a robust loss function, the mean squared robust projected Bellman error and give stochastic gradient descent algorithms that are guaranteed to converge to a local minimum.Comment: To appear in Proceedings of NIPS 201

arXiv.org e-Print Archive

Robust Analysis in Stochastic Simulation: Computation and Performance Guarantees

Author: Ghosh Soumyadip
Lam Henry
Publication venue
Publication date: 11/04/2018
Field of study

Any performance analysis based on stochastic simulation is subject to the errors inherent in misspecifying the modeling assumptions, particularly the input distributions. In situations with little support from data, we investigate the use of worst-case analysis to analyze these errors, by representing the partial, nonparametric knowledge of the input models via optimization constraints. We study the performance and robustness guarantees of this approach. We design and analyze a numerical scheme for solving a general class of simulation objectives and uncertainty specifications. The key steps involve a randomized discretization of the probability spaces, a simulable unbiased gradient estimator using a nonparametric analog of the likelihood ratio method, and a Frank-Wolfe (FW) variant of the stochastic approximation (SA) method (which we call FWSA) run on the space of input probability distributions. A convergence analysis for FWSA on non-convex problems is provided. We test the performance of our approach via several numerical examples

arXiv.org e-Print Archive

Risk Sensitive Rendezvous Algorithm for Heterogeneous Agents in Urban Environments

Author: Gahlawat Aditya
Haberfeld Gabriel Barsi
Hovakimyan Naira
Publication venue
Publication date: 17/02/2021
Field of study

Demand for fast and inexpensive parcel deliveries in urban environments has risen considerably in recent years. A framework is envisioned to enforce efficient last mile delivery in urban environments by leveraging a network of ride-sharing vehicles, where Unmanned Aerial Systems (UASs) drop packages on said vehicles which then cover the majority of the distance to finally be picked up by another UAS for delivery. This approach presents many engineering challenges, including the safe rendezvous of both agents: the UAS and the human-operated ground vehicle. In this paper, we introduce a framework to minimize the risk of failure, while allowing for optimal usage of the controlled agent. We formulate a compact fast planner to drive a UAS to a passive ground vehicle with inexact behavior, while providing intuitive and meaningful procedures to guarantee safety with minimal sacrifice of optimality. The resulting algorithm is shown to be fast and implementable in real-time via numerical tests.Comment: Full version of the same-titled paper accepted to ACC 202

arXiv.org e-Print Archive