4,190 research outputs found
Distributional Reinforcement Learning for Efficient Exploration
In distributional reinforcement learning (RL), the estimated distribution of
value function models both the parametric and intrinsic uncertainties. We
propose a novel and efficient exploration method for deep RL that has two
components. The first is a decaying schedule to suppress the intrinsic
uncertainty. The second is an exploration bonus calculated from the upper
quantiles of the learned distribution. In Atari 2600 games, our method
outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain
across 49 games in cumulative rewards over QR-DQN with a big win in Venture).
We also compared our algorithm with QR-DQN in a challenging 3D driving
simulator (CARLA). Results show that our algorithm achieves near-optimal safety
rewards twice faster than QRDQN
Variational Inference with Tail-adaptive f-Divergence
Variational inference with {\alpha}-divergences has been widely used in
modern probabilistic machine learning. Compared to Kullback-Leibler (KL)
divergence, a major advantage of using {\alpha}-divergences (with positive
{\alpha} values) is their mass-covering property. However, estimating and
optimizing {\alpha}-divergences require to use importance sampling, which could
have extremely large or infinite variances due to heavy tails of importance
weights. In this paper, we propose a new class of tail-adaptive f-divergences
that adaptively change the convex function f with the tail of the importance
weights, in a way that theoretically guarantees finite moments, while
simultaneously achieving mass-covering properties. We test our methods on
Bayesian neural networks, as well as deep reinforcement learning in which our
method is applied to improve a recent soft actor-critic (SAC) algorithm. Our
results show that our approach yields significant advantages compared with
existing methods based on classical KL and {\alpha}-divergences.Comment: NeurIPS 201
Efficient exploration with Double Uncertain Value Networks
This paper studies directed exploration for reinforcement learning agents by
tracking uncertainty about the value of each available action. We identify two
sources of uncertainty that are relevant for exploration. The first originates
from limited data (parametric uncertainty), while the second originates from
the distribution of the returns (return uncertainty). We identify methods to
learn these distributions with deep neural networks, where we estimate
parametric uncertainty with Bayesian drop-out, while return uncertainty is
propagated through the Bellman equation as a Gaussian distribution. Then, we
identify that both can be jointly estimated in one network, which we call the
Double Uncertain Value Network. The policy is directly derived from the learned
distributions based on Thompson sampling. Experimental results show that both
types of uncertainty may vastly improve learning in domains with a strong
exploration challenge.Comment: Deep Reinforcement Learning Symposium @ Conference on Neural
Information Processing Systems (NIPS) 201
Sampling-based Incremental Information Gathering with Applications to Robotic Exploration and Environmental Monitoring
In this article, we propose a sampling-based motion planning algorithm
equipped with an information-theoretic convergence criterion for incremental
informative motion planning. The proposed approach allows dense map
representations and incorporates the full state uncertainty into the planning
process. The problem is formulated as a constrained maximization problem. Our
approach is built on rapidly-exploring information gathering algorithms and
benefits from advantages of sampling-based optimal motion planning algorithms.
We propose two information functions and their variants for fast and online
computations. We prove an information-theoretic convergence for an entire
exploration and information gathering mission based on the least upper bound of
the average map entropy. A natural automatic stopping criterion for
information-driven motion control results from the convergence analysis. We
demonstrate the performance of the proposed algorithms using three scenarios:
comparison of the proposed information functions and sensor configuration
selection, robotic exploration in unknown environments, and a wireless signal
strength monitoring task in a lake from a publicly available dataset collected
using an autonomous surface vehicle.Comment: Revision submitted to IJRR, 49 page
Robustness and macroeconomic policy
This paper considers the design of macroeconomic policies in the face of uncertainty. In recent years, several economists have advocated that when policymakers are uncertain about the environment they face and find it difficult to assign precise probabilities to the alternative scenarios that may characterize this environment, they should design policies to be robust in the sense that they minimize the worstcase loss these policies could ever impose. I review and evaluate the objections cited by critics of this approach. I further argue that, contrary to what some have inferred, concern about worst-case scenarios does not always lead to policies that respond more aggressively to incoming news than the optimal policy would respond absent any uncertainty.Macroeconomics - Econometric models
Recommended from our members
A unified framework for resource-bounded autonomous agents interacting with unknown environments
The aim of this thesis is to present a mathematical framework for conceptualizing and constructing adaptive autonomous systems under resource constraints. The first part of this thesis contains a concise presentation of the foundations of classical agency: namely the formalizations of decision making and learning. Decision making includes: (a) subjective expected utility (SEU) theory, the framework of decision making under uncertainty; (b) the maximum SEU principle to choose the optimal solution; and (c) its application to the design of autonomous systems, culminating in the Bellman optimality equations. Learning includes: (a) Bayesian probability theory, the theory for reasoning under uncertainty that extends logic; and (b) Bayes-Optimal agents, the application of Bayesian probability theory to the design of optimal adaptive agents. Then, two major problems of the maximum SEU principle are highlighted: (a) the prohibitive computational costs and (b) the need for the causal precedence of the choice of the policy. The second part of this thesis tackles the two aforementioned problems. First, an information-theoretic notion of resources in autonomous systems is established. Second, a framework for resource-bounded agency is introduced. This includes: (a) a maximum bounded SEU principle that is derived from a set of axioms of utility; (b) an axiomatic model of probabilistic causality, which is applied for the formalization of autonomous systems having uncertainty over their policy and environment; and (c) the Bayesian control rule, which is derived from the maximum bounded SEU principle and the model of causality, implementing a stochastic adaptive control law that deals with the case where autonomous agents are uncertain about their policy and environment
Reinforcement Learning under Model Mismatch
We study reinforcement learning under model misspecification, where we do not
have access to the true environment but only to a reasonably close
approximation to it. We address this problem by extending the framework of
robust MDPs to the model-free Reinforcement Learning setting, where we do not
have access to the model parameters, but can only sample states from it. We
define robust versions of Q-learning, SARSA, and TD-learning and prove
convergence to an approximately optimal robust policy and approximate value
function respectively. We scale up the robust algorithms to large MDPs via
function approximation and prove convergence under two different settings. We
prove convergence of robust approximate policy iteration and robust approximate
value iteration for linear architectures (under mild assumptions). We also
define a robust loss function, the mean squared robust projected Bellman error
and give stochastic gradient descent algorithms that are guaranteed to converge
to a local minimum.Comment: To appear in Proceedings of NIPS 201
Robust Analysis in Stochastic Simulation: Computation and Performance Guarantees
Any performance analysis based on stochastic simulation is subject to the
errors inherent in misspecifying the modeling assumptions, particularly the
input distributions. In situations with little support from data, we
investigate the use of worst-case analysis to analyze these errors, by
representing the partial, nonparametric knowledge of the input models via
optimization constraints. We study the performance and robustness guarantees of
this approach. We design and analyze a numerical scheme for solving a general
class of simulation objectives and uncertainty specifications. The key steps
involve a randomized discretization of the probability spaces, a simulable
unbiased gradient estimator using a nonparametric analog of the likelihood
ratio method, and a Frank-Wolfe (FW) variant of the stochastic approximation
(SA) method (which we call FWSA) run on the space of input probability
distributions. A convergence analysis for FWSA on non-convex problems is
provided. We test the performance of our approach via several numerical
examples
Risk Sensitive Rendezvous Algorithm for Heterogeneous Agents in Urban Environments
Demand for fast and inexpensive parcel deliveries in urban environments has
risen considerably in recent years. A framework is envisioned to enforce
efficient last mile delivery in urban environments by leveraging a network of
ride-sharing vehicles, where Unmanned Aerial Systems (UASs) drop packages on
said vehicles which then cover the majority of the distance to finally be
picked up by another UAS for delivery. This approach presents many engineering
challenges, including the safe rendezvous of both agents: the UAS and the
human-operated ground vehicle. In this paper, we introduce a framework to
minimize the risk of failure, while allowing for optimal usage of the
controlled agent. We formulate a compact fast planner to drive a UAS to a
passive ground vehicle with inexact behavior, while providing intuitive and
meaningful procedures to guarantee safety with minimal sacrifice of optimality.
The resulting algorithm is shown to be fast and implementable in real-time via
numerical tests.Comment: Full version of the same-titled paper accepted to ACC 202
- …