17 research outputs found

    Efficient Exploration in Continuous-time Model-based Reinforcement Learning

    Full text link
    Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications

    Tuning Legged Locomotion Controllers via Safe Bayesian Optimization

    Full text link
    This paper presents a data-driven strategy to streamline the deployment of model-based controllers in legged robotic hardware platforms. Our approach leverages a model-free safe learning algorithm to automate the tuning of control gains, addressing the mismatch between the simplified model used in the control formulation and the real system. This method substantially mitigates the risk of hazardous interactions with the robot by sample-efficiently optimizing parameters within a probably safe region. Additionally, we extend the applicability of our approach to incorporate the different gait parameters as contexts, leading to a safe, sample-efficient exploration algorithm capable of tuning a motion controller for diverse gait patterns. We validate our method through simulation and hardware experiments, where we demonstrate that the algorithm obtains superior performance on tuning a model-based motion controller for multiple gaits safely.Comment: This paper has been accepted to the 2023 Conference on Robot Learning (CoRL 2023.) The first two authors contributed equally. The supplementary video is available at https://youtu.be/zDBouUgegrU and the code implementation is available at https://github.com/lasgroup/gosafeop

    Ein Nervenfall aus der Praxis eines chinesischen Arztes vor mehr als 2000 Jahren

    No full text

    Ein Fall von Hirnabsceß unklarer Genese

    No full text

    Dissertatio Inavgvralis De Obligatione Socii Innocentis In Delictis

    No full text
    Helmstedt, Univ., Jur. Diss., 1796Qvam Avctoritate Illvstris Ivreconsvltorvm Ordinis In Academia Ivlia Carolina Pro Svmmis In Vtroqve Ivre Honoribvs Rite Obtinendis Die XII Aprilis MDCCLXXXXVI Proposvit Ioannes Henricvs Hübotter HildesiensisVorlageform des Erscheinungsvermerks: Helmstadii Typis C. G. Fleckeisen Acad. Typogr

    Learning policies for continuous control via transition models

    No full text
    It is doubtful that animals have perfect inverse models of their limbs (e.g., what muscle contraction must be applied to every joint to reach a particular location in space). However, in robot control, moving an arm's end-effector to a target position or along a target trajectory requires accurate forward and inverse models. Here we show that by learning the transition (forward) model from interaction, we can use it to drive the learning of an amortized policy. Hence, we revisit policy optimization in relation to the deep active inference framework and describe a modular neural network architecture that simultaneously learns the system dynamics from prediction errors and the stochastic policy that generates suitable continuous control commands to reach a desired reference position. We evaluated the model by comparing it against the baseline of a linear quadratic regulator, and conclude with additional steps to take toward human-like motor control
    corecore