8 research outputs found
Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation
We present the OMG-CMDP! algorithm for regret minimization in adversarial
Contextual MDPs. The algorithm operates under the minimal assumptions of
realizable function class and access to online least squares and log loss
regression oracles. Our algorithm is efficient (assuming efficient online
regression oracles), simple and robust to approximation errors. It enjoys an
regret guarantee, with being the number of episodes,
the state space, the action space, the horizon and
is the sum of the
regression oracles' regret, used to approximate the context-dependent rewards
and dynamics, respectively. To the best of our knowledge, our algorithm is the
first efficient rate optimal regret minimization algorithm for adversarial
CMDPs that operates under the minimal standard assumption of online function
approximation
Counterfactual Optimism: Rate Optimal Regret for Stochastic Contextual MDPs
We present the UCRL algorithm for regret minimization in Stochastic
Contextual MDPs (CMDPs). The algorithm operates under the minimal assumptions
of realizable function class, and access to offline least squares and log loss
regression oracles. Our algorithm is efficient (assuming efficient offline
regression oracles) and enjoys an regret guarantee,
with being the number of episodes, the state space, the action
space, the horizon, and and are finite function
classes, used to approximate the context-dependent dynamics and rewards,
respectively. To the best of our knowledge, our algorithm is the first
efficient and rate-optimal regret minimization algorithm for CMDPs, which
operates under the general offline function approximation setting
Rate-Optimal Online Convex Optimization in Adaptive Linear Control
We consider the problem of controlling an unknown linear dynamical system
under adversarially changing convex costs and full feedback of both the state
and cost function. We present the first computationally-efficient algorithm
that attains an optimal -regret rate compared to the best
stabilizing linear controller in hindsight, while avoiding stringent
assumptions on the costs such as strong convexity. Our approach is based on a
careful design of non-convex lower confidence bounds for the online costs, and
uses a novel technique for computationally-efficient regret minimization of
these bounds that leverages their particular non-convex structure.Comment: arXiv admin note: text overlap with arXiv:2203.0117