2,665 research outputs found
Optimistic Robust Optimization With Applications To Machine Learning
Robust Optimization has traditionally taken a pessimistic, or worst-case
viewpoint of uncertainty which is motivated by a desire to find sets of optimal
policies that maintain feasibility under a variety of operating conditions. In
this paper, we explore an optimistic, or best-case view of uncertainty and show
that it can be a fruitful approach. We show that these techniques can be used
to address a wide variety of problems. First, we apply our methods in the
context of robust linear programming, providing a method for reducing
conservatism in intuitive ways that encode economically realistic modeling
assumptions. Second, we look at problems in machine learning and find that this
approach is strongly connected to the existing literature. Specifically, we
provide a new interpretation for popular sparsity inducing non-convex
regularization schemes. Additionally, we show that successful approaches for
dealing with outliers and noise can be interpreted as optimistic robust
optimization problems. Although many of the problems resulting from our
approach are non-convex, we find that DCA or DCA-like optimization approaches
can be intuitive and efficient
Bayesian Policy Gradients via Alpha Divergence Dropout Inference
Policy gradient methods have had great success in solving continuous control
tasks, yet the stochastic nature of such problems makes deterministic value
estimation difficult. We propose an approach which instead estimates a
distribution by fitting the value function with a Bayesian Neural Network. We
optimize an -divergence objective with Bayesian dropout approximation
to learn and estimate this distribution. We show that using the Monte Carlo
posterior mean of the Bayesian value function distribution, rather than a
deterministic network, improves stability and performance of policy gradient
methods in continuous control MuJoCo simulations.Comment: Accepted to Bayesian Deep Learning Workshop at NIPS 201
A Modern Introduction to Online Learning
In this monograph, I introduce the basic concepts of Online Learning through
a modern view of Online Convex Optimization. Here, online learning refers to
the framework of regret minimization under worst-case assumptions. I present
first-order and second-order algorithms for online learning with convex losses,
in Euclidean and non-Euclidean settings. All the algorithms are clearly
presented as instantiation of Online Mirror Descent or
Follow-The-Regularized-Leader and their variants. Particular attention is given
to the issue of tuning the parameters of the algorithms and learning in
unbounded domains, through adaptive and parameter-free online learning
algorithms. Non-convex losses are dealt through convex surrogate losses and
through randomization. The bandit setting is also briefly discussed, touching
on the problem of adversarial and stochastic multi-armed bandits. These notes
do not require prior knowledge of convex analysis and all the required
mathematical tools are rigorously explained. Moreover, all the proofs have been
carefully chosen to be as simple and as short as possible.Comment: Fixed more typos, added more history bits, added local norms bounds
for OMD and FTR
Acceleration in Policy Optimization
We work towards a unifying paradigm for accelerating policy optimization
methods in reinforcement learning (RL) by integrating foresight in the policy
improvement step via optimistic and adaptive updates. Leveraging the connection
between policy iteration and policy gradient methods, we view policy
optimization algorithms as iteratively solving a sequence of surrogate
objectives, local lower bounds on the original objective. We define optimism as
predictive modelling of the future behavior of a policy, and adaptivity as
taking immediate and anticipatory corrective actions to mitigate accumulating
errors from overshooting predictions or delayed responses to change. We use
this shared lens to jointly express other well-known algorithms, including
model-based policy improvement based on forward search, and optimistic
meta-learning algorithms. We analyze properties of this formulation, and show
connections to other accelerated optimization algorithms. Then, we design an
optimistic policy gradient algorithm, adaptive via meta-gradient learning, and
empirically highlight several design choices pertaining to acceleration, in an
illustrative task
Scalable First-Order Methods for Robust MDPs
Robust Markov Decision Processes (MDPs) are a powerful framework for modeling
sequential decision-making problems with model uncertainty. This paper proposes
the first first-order framework for solving robust MDPs. Our algorithm
interleaves primal-dual first-order updates with approximate Value Iteration
updates. By carefully controlling the tradeoff between the accuracy and cost of
Value Iteration updates, we achieve an ergodic convergence rate of for the best
choice of parameters on ellipsoidal and Kullback-Leibler -rectangular
uncertainty sets, where and is the number of states and actions,
respectively. Our dependence on the number of states and actions is
significantly better (by a factor of ) than that of pure
Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty
sets we show that our algorithm is significantly more scalable than
state-of-the-art approaches. Our framework is also the first one to solve
robust MDPs with -rectangular KL uncertainty sets
- …