624 research outputs found

    Non-parametric regression for robot learning on manifolds

    Full text link
    Many of the tools available for robot learning were designed for Euclidean data. However, many applications in robotics involve manifold-valued data. A common example is orientation; this can be represented as a 3-by-3 rotation matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In robot learning, manifold-valued data are often handled by relating the manifold to a suitable Euclidean space, either by embedding the manifold or by projecting the data onto one or several tangent spaces. These approaches can result in poor predictive accuracy, and convoluted algorithms. In this paper, we propose an "intrinsic" approach to regression that works directly within the manifold. It involves taking a suitable probability distribution on the manifold, letting its parameter be a function of a predictor variable, such as time, then estimating that function non-parametrically via a "local likelihood" method that incorporates a kernel. We name the method kernelised likelihood estimation. The approach is conceptually simple, and generally applicable to different manifolds. We implement it with three different types of manifold-valued data that commonly appear in robotics applications. The results of these experiments show better predictive accuracy than projection-based algorithms.Comment: 17 pages, 15 figure

    Memory-efficient episodic control reinforcement learning with dynamic online k-means

    Get PDF
    Recently, neuro-inspired episodic control (EC) methods have been developed to overcome the data-inefficiency of standard deep reinforcement learning approaches. Using non-/semi-parametric models to estimate the value function, they learn rapidly, retrieving cached values from similar past states. In realistic scenarios, with limited resources and noisy data, maintaining meaningful representations in memory is essential to speed up the learning and avoid catastrophic forgetting. Unfortunately, EC methods have a large space and time complexity. We investigate different solutions to these problems based on prioritising and ranking stored states, as well as online clustering techniques. We also propose a new dynamic online k-means algorithm that is both computationally-efficient and yields significantly better performance at smaller memory sizes; we validate this approach on classic reinforcement learning environments and Atari games

    Sample-efficient reinforcement learning with maximum entropy mellowmax episodic control

    Get PDF
    Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspired algorithms which use non-/semi-parametric models that predict values based on storing and retrieving previously experienced transitions. One way to further improve the sample efficiency of these approaches is to use more principled exploration strategies. In this work, we therefore propose maximum entropy mellowmax episodic control (MEMEC), which samples actions according to a Boltzmann policy with a state-dependent temperature. We demonstrate that MEMEC outperforms other uncertainty- and softmax-based exploration methods on classic reinforcement learning environments and Atari games, achieving both more rapid learning and higher final rewards

    Micromechanical model for off-axis creep rupture in unidirectional composites undergoing finite strains

    Get PDF
    A microscale numerical framework for modeling creep rupture in unidirectional composites under off-axis loading is presented, building on recent work on imposing off-axis loading on a representative volume element. Creep deformation of the thermoplastic polymer matrix is accounted for by means of the Eindhoven Glassy Polymer material model. Creep rupture is represented with cohesive cracks, combining an energy-based initiation criterion with a time-dependent cohesive law and a global failure criterion based on the minimum in homogenized creep strain-rate. The model is compared against experiments on carbon/PEEK composite material tested at different off-axis angles, stress levels and temperatures. Creep deformation is accurately reproduced by the model, except for small off-axis angles, where the observed difference is ascribed to macroscopic variations in the experiment. Trends in rupture time are also reproduced although quantitative rupture time predictions are not for all test cases accurate.</p
    corecore