624 research outputs found
Non-parametric regression for robot learning on manifolds
Many of the tools available for robot learning were designed for Euclidean
data. However, many applications in robotics involve manifold-valued data. A
common example is orientation; this can be represented as a 3-by-3 rotation
matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In
robot learning, manifold-valued data are often handled by relating the manifold
to a suitable Euclidean space, either by embedding the manifold or by
projecting the data onto one or several tangent spaces. These approaches can
result in poor predictive accuracy, and convoluted algorithms. In this paper,
we propose an "intrinsic" approach to regression that works directly within the
manifold. It involves taking a suitable probability distribution on the
manifold, letting its parameter be a function of a predictor variable, such as
time, then estimating that function non-parametrically via a "local likelihood"
method that incorporates a kernel. We name the method kernelised likelihood
estimation. The approach is conceptually simple, and generally applicable to
different manifolds. We implement it with three different types of
manifold-valued data that commonly appear in robotics applications. The results
of these experiments show better predictive accuracy than projection-based
algorithms.Comment: 17 pages, 15 figure
Memory-efficient episodic control reinforcement learning with dynamic online k-means
Recently, neuro-inspired episodic control (EC) methods have been developed to overcome the data-inefficiency of standard deep reinforcement learning approaches. Using non-/semi-parametric models to estimate the value function, they learn rapidly, retrieving cached values from similar past states. In realistic scenarios, with limited resources and noisy data, maintaining meaningful representations in memory is essential to speed up the learning and avoid catastrophic forgetting. Unfortunately, EC methods have a large space and time complexity. We investigate different solutions to these problems based on prioritising and ranking stored states, as well as online clustering techniques. We also propose a new dynamic online k-means algorithm that is both computationally-efficient and yields significantly better performance at smaller memory sizes; we validate this approach on classic reinforcement learning environments and Atari games
Sample-efficient reinforcement learning with maximum entropy mellowmax episodic control
Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspired algorithms which use non-/semi-parametric models that predict values based on storing and retrieving previously experienced transitions. One way to further improve the sample efficiency of these approaches is to use more principled exploration strategies. In this work, we therefore propose maximum entropy mellowmax episodic control (MEMEC), which samples actions according to a Boltzmann policy with a state-dependent temperature. We demonstrate that MEMEC outperforms other uncertainty- and softmax-based exploration methods on classic reinforcement learning environments and Atari games, achieving both more rapid learning and higher final rewards
Micromechanical model for off-axis creep rupture in unidirectional composites undergoing finite strains
A microscale numerical framework for modeling creep rupture in unidirectional composites under off-axis loading is presented, building on recent work on imposing off-axis loading on a representative volume element. Creep deformation of the thermoplastic polymer matrix is accounted for by means of the Eindhoven Glassy Polymer material model. Creep rupture is represented with cohesive cracks, combining an energy-based initiation criterion with a time-dependent cohesive law and a global failure criterion based on the minimum in homogenized creep strain-rate. The model is compared against experiments on carbon/PEEK composite material tested at different off-axis angles, stress levels and temperatures. Creep deformation is accurately reproduced by the model, except for small off-axis angles, where the observed difference is ascribed to macroscopic variations in the experiment. Trends in rupture time are also reproduced although quantitative rupture time predictions are not for all test cases accurate.</p
- …