7,883 research outputs found
Physics-informed reinforcement learning via probabilistic co-adjustment functions
Reinforcement learning of real-world tasks is very data inefficient, and
extensive simulation-based modelling has become the dominant approach for
training systems. However, in human-robot interaction and many other real-world
settings, there is no appropriate one-model-for-all due to differences in
individual instances of the system (e.g. different people) or necessary
oversimplifications in the simulation models. This requires two approaches: 1.
either learning the individual system's dynamics approximately from data which
requires data-intensive training or 2. using a complete digital twin of the
instances, which may not be realisable in many cases. We introduce two
approaches: co-kriging adjustments (CKA) and ridge regression adjustment (RRA)
as novel ways to combine the advantages of both approaches. Our adjustment
methods are based on an auto-regressive AR1 co-kriging model that we integrate
with GP priors. This yield a data- and simulation-efficient way of using
simplistic simulation models (e.g., simple two-link model) and rapidly adapting
them to individual instances (e.g., biomechanics of individual people). Using
CKA and RRA, we obtain more accurate uncertainty quantification of the entire
system's dynamics than pure GP-based and AR1 methods. We demonstrate the
efficiency of co-kriging adjustment with an interpretable reinforcement
learning control example, learning to control a biomechanical human arm using
only a two-link arm simulation model (offline part) and CKA derived from a
small amount of interaction data (on-the-fly online). Our method unlocks an
efficient and uncertainty-aware way to implement reinforcement learning methods
in real world complex systems for which only imperfect simulation models exist
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming
Autonomously training interpretable control strategies, called policies,
using pre-existing plant trajectory data is of great interest in industrial
applications. Fuzzy controllers have been used in industry for decades as
interpretable and efficient system controllers. In this study, we introduce a
fuzzy genetic programming (GP) approach called fuzzy GP reinforcement learning
(FGPRL) that can select the relevant state features, determine the size of the
required fuzzy rule set, and automatically adjust all the controller parameters
simultaneously. Each GP individual's fitness is computed using model-based
batch reinforcement learning (RL), which first trains a model using available
system samples and subsequently performs Monte Carlo rollouts to predict each
policy candidate's performance. We compare FGPRL to an extended version of a
related method called fuzzy particle swarm reinforcement learning (FPSRL),
which uses swarm intelligence to tune the fuzzy policy parameters. Experiments
using an industrial benchmark show that FGPRL is able to autonomously learn
interpretable fuzzy policies with high control performance.Comment: Accepted at Genetic and Evolutionary Computation Conference 2018
(GECCO '18
Combined Reinforcement Learning via Abstract Representations
In the quest for efficient and robust reinforcement learning methods, both
model-free and model-based approaches offer advantages. In this paper we
propose a new way of explicitly bridging both approaches via a shared
low-dimensional learned encoding of the environment, meant to capture
summarizing abstractions. We show that the modularity brought by this approach
leads to good generalization while being computationally efficient, with
planning happening in a smaller latent state space. In addition, this approach
recovers a sufficient low-dimensional representation of the environment, which
opens up new strategies for interpretable AI, exploration and transfer
learning.Comment: Accepted to the Thirty-Third AAAI Conference On Artificial
Intelligence, 201
Learning a Structured Neural Network Policy for a Hopping Task
In this work we present a method for learning a reactive policy for a simple
dynamic locomotion task involving hard impact and switching contacts where we
assume the contact location and contact timing to be unknown. To learn such a
policy, we use optimal control to optimize a local controller for a fixed
environment and contacts. We learn the contact-rich dynamics for our
underactuated systems along these trajectories in a sample efficient manner. We
use the optimized policies to learn the reactive policy in form of a neural
network. Using a new neural network architecture, we are able to preserve more
information from the local policy and make its output interpretable in the
sense that its output in terms of desired trajectories, feedforward commands
and gains can be interpreted. Extensive simulations demonstrate the robustness
of the approach to changing environments, outperforming a model-free gradient
policy based methods on the same tasks in simulation. Finally, we show that the
learned policy can be robustly transferred on a real robot.Comment: IEEE Robotics and Automation Letters 201
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
Combining deep model-free reinforcement learning with on-line planning is a
promising approach to building on the successes of deep RL. On-line planning
with look-ahead trees has proven successful in environments where transition
models are known a priori. However, in complex environments where transition
models need to be learned from data, the deficiencies of learned models have
limited their utility for planning. To address these challenges, we propose
TreeQN, a differentiable, recursive, tree-structured model that serves as a
drop-in replacement for any value function network in deep RL with discrete
actions. TreeQN dynamically constructs a tree by recursively applying a
transition model in a learned abstract state space and then aggregating
predicted rewards and state-values using a tree backup to estimate Q-values. We
also propose ATreeC, an actor-critic variant that augments TreeQN with a
softmax layer to form a stochastic policy network. Both approaches are trained
end-to-end, such that the learned model is optimised for its actual use in the
tree. We show that TreeQN and ATreeC outperform n-step DQN and A2C on a
box-pushing task, as well as n-step DQN and value prediction networks (Oh et
al. 2017) on multiple Atari games. Furthermore, we present ablation studies
that demonstrate the effect of different auxiliary losses on learning
transition models
SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Deep reinforcement learning (DRL) has gained great success by learning
directly from high-dimensional sensory inputs, yet is notorious for the lack of
interpretability. Interpretability of the subtasks is critical in hierarchical
decision-making as it increases the transparency of black-box-style DRL
approach and helps the RL practitioners to understand the high-level behavior
of the system better. In this paper, we introduce symbolic planning into DRL
and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can
handle both high-dimensional sensory inputs and symbolic planning. The
task-level interpretability is enabled by relating symbolic actions to
options.This framework features a planner -- controller -- meta-controller
architecture, which takes charge of subtask scheduling, data-driven subtask
learning, and subtask evaluation, respectively. The three components
cross-fertilize each other and eventually converge to an optimal symbolic plan
along with the learned subtasks, bringing together the advantages of long-term
planning capability with symbolic knowledge and end-to-end reinforcement
learning directly from a high-dimensional sensory input. Experimental results
validate the interpretability of subtasks, along with improved data efficiency
compared with state-of-the-art approaches
- …