Search CORE

2,665 research outputs found

Optimistic Robust Optimization With Applications To Machine Learning

Author: Mafusalov Alexander
Norton Matthew
Takeda Akiko
Publication venue
Publication date: 20/11/2017
Field of study

Robust Optimization has traditionally taken a pessimistic, or worst-case viewpoint of uncertainty which is motivated by a desire to find sets of optimal policies that maintain feasibility under a variety of operating conditions. In this paper, we explore an optimistic, or best-case view of uncertainty and show that it can be a fruitful approach. We show that these techniques can be used to address a wide variety of problems. First, we apply our methods in the context of robust linear programming, providing a method for reducing conservatism in intuitive ways that encode economically realistic modeling assumptions. Second, we look at problems in machine learning and find that this approach is strongly connected to the existing literature. Specifically, we provide a new interpretation for popular sparsity inducing non-convex regularization schemes. Additionally, we show that successful approaches for dealing with outliers and noise can be interpreted as optimistic robust optimization problems. Although many of the problems resulting from our approach are non-convex, we find that DCA or DCA-like optimization approaches can be intuitive and efficient

arXiv.org e-Print Archive

Calhoun, Institutional Archive of the Naval Postgraduate School

Bayesian Policy Gradients via Alpha Divergence Dropout Inference

Author: Henderson Peter
Doan Thang
Islam Riashat
Meger David
Publication venue
Publication date: 05/12/2017
Field of study

Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult. We propose an approach which instead estimates a distribution by fitting the value function with a Bayesian Neural Network. We optimize an

\alpha

-divergence objective with Bayesian dropout approximation to learn and estimate this distribution. We show that using the Monte Carlo posterior mean of the Bayesian value function distribution, rather than a deterministic network, improves stability and performance of policy gradient methods in continuous control MuJoCo simulations.Comment: Accepted to Bayesian Deep Learning Workshop at NIPS 201

arXiv.org e-Print Archive

FigShare

A Modern Introduction to Online Learning

Author: Orabona Francesco
Publication venue
Publication date: 29/12/2020
Field of study

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.Comment: Fixed more typos, added more history bits, added local norms bounds for OMD and FTR

arXiv.org e-Print Archive

Acceleration in Policy Optimization

Author: Chelu Veronica
Flennerhag Sebastian
Guez Arthur
Precup Doina
Zahavy Tom
Publication venue
Publication date: 05/09/2023
Field of study

We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bounds on the original objective. We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate accumulating errors from overshooting predictions or delayed responses to change. We use this shared lens to jointly express other well-known algorithms, including model-based policy improvement based on forward search, and optimistic meta-learning algorithms. We analyze properties of this formulation, and show connections to other accelerated optimization algorithms. Then, we design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task

arXiv.org e-Print Archive

Scalable First-Order Methods for Robust MDPs

Author: Grand-Clément Julien
Kroer Christian
Publication venue
Publication date: 14/01/2021
Field of study

Robust Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision-making problems with model uncertainty. This paper proposes the first first-order framework for solving robust MDPs. Our algorithm interleaves primal-dual first-order updates with approximate Value Iteration updates. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of

O \left( A^{2} S^{3}\log(S)\log(\epsilon^{-1}) \epsilon^{-1} \right)

for the best choice of parameters on ellipsoidal and Kullback-Leibler

s

-rectangular uncertainty sets, where

S

and

A

is the number of states and actions, respectively. Our dependence on the number of states and actions is significantly better (by a factor of

O(A^{1.5}S^{1.5})

) than that of pure Value Iteration algorithms. In numerical experiments on ellipsoidal uncertainty sets we show that our algorithm is significantly more scalable than state-of-the-art approaches. Our framework is also the first one to solve robust MDPs with

s

-rectangular KL uncertainty sets

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications