5 research outputs found
A Boosting Approach to Reinforcement Learning
Reducing reinforcement learning to supervised learning is a well-studied and
effective approach that leverages the benefits of compact function
approximation to deal with large-scale Markov decision processes.
Independently, the boosting methodology (e.g. AdaBoost) has proven to be
indispensable in designing efficient and accurate classification algorithms by
combining inaccurate rules-of-thumb.
In this paper, we take a further step: we reduce reinforcement learning to a
sequence of weak learning problems. Since weak learners perform only marginally
better than random guesses, such subroutines constitute a weaker assumption
than the availability of an accurate supervised learning oracle. We prove that
the sample complexity and running time bounds of the proposed method do not
explicitly depend on the number of states.
While existing results on boosting operate on convex losses, the value
function over policies is non-convex. We show how to use a non-convex variant
of the Frank-Wolfe method for boosting, that additionally improves upon the
known sample complexity and running time even for reductions to supervised
learning.Comment: Now in sync with camera ready for NeurIPS 202
Boosting for Control of Dynamical Systems
We study the question of how to aggregate controllers for dynamical systems
in order to improve their performance. To this end, we propose a framework of
boosting for online control. Our main result is an efficient boosting algorithm
that combines weak controllers into a provably more accurate one. Empirical
evaluation on a host of control settings supports our theoretical findings
Online Agnostic Boosting via Regret Minimization
Boosting is a widely used machine learning approach based on the idea of
aggregating weak learning rules. While in statistical learning numerous
boosting methods exist both in the realizable and agnostic settings, in online
learning they exist only in the realizable case. In this work we provide the
first agnostic online boosting algorithm; that is, given a weak learner with
only marginally-better-than-trivial regret guarantees, our algorithm boosts it
to a strong learner with sublinear regret.
Our algorithm is based on an abstract (and simple) reduction to online convex
optimization, which efficiently converts an arbitrary online convex optimizer
to an online booster.
Moreover, this reduction extends to the statistical as well as the online
realizable settings, thus unifying the 4 cases of statistical/online and
agnostic/realizable boosting