17 research outputs found
Efficient Exploration in Continuous-time Model-based Reinforcement Learning
Reinforcement learning algorithms typically consider discrete-time dynamics,
even though the underlying systems are often continuous in time. In this paper,
we introduce a model-based reinforcement learning algorithm that represents
continuous-time dynamics using nonlinear ordinary differential equations
(ODEs). We capture epistemic uncertainty using well-calibrated probabilistic
models, and use the optimistic principle for exploration. Our regret bounds
surface the importance of the measurement selection strategy(MSS), since in
continuous time we not only must decide how to explore, but also when to
observe the underlying system. Our analysis demonstrates that the regret is
sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of
MSS, such as equidistant sampling. Additionally, we propose an adaptive,
data-dependent, practical MSS that, when combined with GP dynamics, also
achieves sublinear regret with significantly fewer samples. We showcase the
benefits of continuous-time modeling over its discrete-time counterpart, as
well as our proposed adaptive MSS over standard baselines, on several
applications
Tuning Legged Locomotion Controllers via Safe Bayesian Optimization
This paper presents a data-driven strategy to streamline the deployment of
model-based controllers in legged robotic hardware platforms. Our approach
leverages a model-free safe learning algorithm to automate the tuning of
control gains, addressing the mismatch between the simplified model used in the
control formulation and the real system. This method substantially mitigates
the risk of hazardous interactions with the robot by sample-efficiently
optimizing parameters within a probably safe region. Additionally, we extend
the applicability of our approach to incorporate the different gait parameters
as contexts, leading to a safe, sample-efficient exploration algorithm capable
of tuning a motion controller for diverse gait patterns. We validate our method
through simulation and hardware experiments, where we demonstrate that the
algorithm obtains superior performance on tuning a model-based motion
controller for multiple gaits safely.Comment: This paper has been accepted to the 2023 Conference on Robot Learning
(CoRL 2023.) The first two authors contributed equally. The supplementary
video is available at https://youtu.be/zDBouUgegrU and the code
implementation is available at https://github.com/lasgroup/gosafeop
Dissertatio Inavgvralis De Obligatione Socii Innocentis In Delictis
Helmstedt, Univ., Jur. Diss., 1796Qvam Avctoritate Illvstris Ivreconsvltorvm Ordinis In Academia Ivlia Carolina Pro Svmmis In Vtroqve Ivre Honoribvs Rite Obtinendis Die XII Aprilis MDCCLXXXXVI Proposvit Ioannes Henricvs Hübotter HildesiensisVorlageform des Erscheinungsvermerks: Helmstadii Typis C. G. Fleckeisen Acad. Typogr
Learning policies for continuous control via transition models
It is doubtful that animals have perfect inverse models of their limbs (e.g., what muscle contraction must be applied to every joint to reach a particular location in space). However, in robot control, moving an arm's end-effector to a target position or along a target trajectory requires accurate forward and inverse models. Here we show that by learning the transition (forward) model from interaction, we can use it to drive the learning of an amortized policy. Hence, we revisit policy optimization in relation to the deep active inference framework and describe a modular neural network architecture that simultaneously learns the system dynamics from prediction errors and the stochastic policy that generates suitable continuous control commands to reach a desired reference position. We evaluated the model by comparing it against the baseline of a linear quadratic regulator, and conclude with additional steps to take toward human-like motor control