Search CORE

8 research outputs found

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

Author: Cassel Asaf
Cohen Alon
Levy Orin
Mansour Yishay
Publication venue
Publication date: 14/08/2023
Field of study

We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an

\widetilde{O}(H^{2.5} \sqrt{ T|S||A| ( \mathcal{R}(\mathcal{O}) + H \log(\delta^{-1}) )})

regret guarantee, with

T

being the number of episodes,

S

the state space,

A

the action space,

H

the horizon and

\mathcal{R}(\mathcal{O}) = \mathcal{R}(\mathcal{O}_{\mathrm{sq}}^\mathcal{F}) + \mathcal{R}(\mathcal{O}_{\mathrm{log}}^\mathcal{P})

is the sum of the regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation

arXiv.org e-Print Archive

Counterfactual Optimism: Rate Optimal Regret for Stochastic Contextual MDPs

Author: Cassel Asaf
Cohen Alon
Levy Orin
Mansour Yishay
Publication venue
Publication date: 27/11/2022
Field of study

We present the UC

^3

RL algorithm for regret minimization in Stochastic Contextual MDPs (CMDPs). The algorithm operates under the minimal assumptions of realizable function class, and access to offline least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys an

\widetilde{O}(H^3 \sqrt{T |S| |A|(\log (|\mathcal{F}|/\delta) + \log (|\mathcal{P}|/ \delta) )})

regret guarantee, with

T

being the number of episodes,

S

the state space,

A

the action space,

H

the horizon, and

\mathcal{P}

and

\mathcal{F}

are finite function classes, used to approximate the context-dependent dynamics and rewards, respectively. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs, which operates under the general offline function approximation setting

arXiv.org e-Print Archive

Rate-Optimal Online Convex Optimization in Adaptive Linear Control

Author: Cassel Asaf
Cohen Alon
Koren Tomer
Publication venue
Publication date: 03/06/2022
Field of study

We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function. We present the first computationally-efficient algorithm that attains an optimal

\smash{\sqrt{T}}

-regret rate compared to the best stabilizing linear controller in hindsight, while avoiding stringent assumptions on the costs such as strong convexity. Our approach is based on a careful design of non-convex lower confidence bounds for the online costs, and uses a novel technique for computationally-efficient regret minimization of these bounds that leverages their particular non-convex structure.Comment: arXiv admin note: text overlap with arXiv:2203.0117

arXiv.org e-Print Archive

Accelerated long-term forgetting

Author: Antony
Asaf Gilboa
Atherton
Audrain
Baddeley
Blake
Butler
Butler
Butler
Cassel
Chris Butler
De Renzi
Elliott
Gascoigne
Geurts
Grayson-Collins
Helmstaedter
Hoefeijzers
Lah
Laurie Miller
Martin
Mayes
O'Connor
Ricci
Savage
Visser
Weston
Zimmermann
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref