124 research outputs found
Online Control with Adversarial Disturbances
We study the control of a linear dynamical system with adversarial
disturbances (as opposed to statistical noise). The objective we consider is
one of regret: we desire an online control procedure that can do nearly as well
as that of a procedure that has full knowledge of the disturbances in
hindsight. Our main result is an efficient algorithm that provides nearly tight
regret bounds for this problem. From a technical standpoint, this work
generalizes upon previous work in two main aspects: our model allows for
adversarial noise in the dynamics, and allows for general convex costs
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
We derive sublinear regret bounds for undiscounted reinforcement learning in
continuous state space. The proposed algorithm combines state aggregation with
the use of upper confidence bounds for implementing optimism in the face of
uncertainty. Beside the existence of an optimal policy which satisfies the
Poisson equation, the only assumptions made are Holder continuity of rewards
and transition probabilities
Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
In this paper, we present a robotic model-based reinforcement learning method
that combines ideas from model identification and model predictive control. We
use a feature-based representation of the dynamics that allows the dynamics
model to be fitted with a simple least squares procedure, and the features are
identified from a high-level specification of the robot's morphology,
consisting of the number and connectivity structure of its links. Model
predictive control is then used to choose the actions under an optimistic model
of the dynamics, which produces an efficient and goal-directed exploration
strategy. We present real time experimental results on standard benchmark
problems involving the pendulum, cartpole, and double pendulum systems.
Experiments indicate that our method is able to learn a range of benchmark
tasks substantially faster than the previous best methods. To evaluate our
approach on a realistic robotic control task, we also demonstrate real time
control of a simulated 7 degree of freedom arm.Comment: 8 page
- …