9,525 research outputs found
A Tour of Reinforcement Learning: The View from Continuous Control
This manuscript surveys reinforcement learning from the perspective of
optimization and control with a focus on continuous control applications. It
surveys the general formulation, terminology, and typical experimental
implementations of reinforcement learning and reviews competing solution
paradigms. In order to compare the relative merits of various techniques, this
survey presents a case study of the Linear Quadratic Regulator (LQR) with
unknown dynamics, perhaps the simplest and best-studied problem in optimal
control. The manuscript describes how merging techniques from learning theory
and control can provide non-asymptotic characterizations of LQR performance and
shows that these characterizations tend to match experimental behavior. In
turn, when revisiting more complex applications, many of the observed phenomena
in LQR persist. In particular, theory and experiment demonstrate the role and
importance of models and the cost of generality in reinforcement learning
algorithms. This survey concludes with a discussion of some of the challenges
in designing learning systems that safely and reliably interact with complex
and uncertain environments and how tools from reinforcement learning and
control might be combined to approach these challenges.Comment: minor revision with a few clarifying passages and corrected typo
Psychiatric Illnesses as Disorders of Network Dynamics
This review provides a dynamical systems perspective on psychiatric symptoms
and disease, and discusses its potential implications for diagnosis, prognosis,
and treatment. After a brief introduction into the theory of dynamical systems,
we will focus on the idea that cognitive and emotional functions are
implemented in terms of dynamical systems phenomena in the brain, a common
assumption in theoretical and computational neuroscience. Specific
computational models, anchored in biophysics, for generating different types of
network dynamics, and with a relation to psychiatric symptoms, will be briefly
reviewed, as well as methodological approaches for reconstructing the system
dynamics from observed time series (like fMRI or EEG recordings). We then
attempt to outline how psychiatric phenomena, associated with schizophrenia,
depression, PTSD, ADHD, phantom pain, and others, could be understood in
dynamical systems terms. Most importantly, we will try to convey that the
dynamical systems level may provide a central, hub-like level of convergence
which unifies and links multiple biophysical and behavioral phenomena, in the
sense that diverse biophysical changes can give rise to the same dynamical
phenomena and, vice versa, similar changes in dynamics may yield different
behavioral symptoms depending on the brain area where these changes manifest.
If this assessment is correct, it may have profound implications for the
diagnosis, prognosis, and treatment of psychiatric conditions, as it puts the
focus on dynamics. We therefore argue that consideration of dynamics should
play an important role in the choice and target of interventions
Recurrent Predictive State Policy Networks
We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent
architecture that brings insights from predictive state representations to
reinforcement learning in partially observable environments. Predictive state
policy networks consist of a recursive filter, which keeps track of a belief
about the state of the environment, and a reactive policy that directly maps
beliefs to actions, to maximize the cumulative reward. The recursive filter
leverages predictive state representations (PSRs) (Rosencrantz and Gordon,
2004; Sun et al., 2016) by modeling predictive state-- a prediction of the
distribution of future observations conditioned on history and future actions.
This representation gives rise to a rich class of statistically consistent
algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive
state serves as an equivalent representation of a belief state. Therefore, the
policy component of the RPSP-network can be purely reactive, simplifying
training while still allowing optimal behaviour. Moreover, we use the PSR
interpretation during training as well, by incorporating prediction error in
the loss function. The entire network (recursive filter and reactive policy) is
still differentiable and can be trained using gradient based methods. We
optimize our policy using a combination of policy gradient based on rewards
(Williams, 1992) and gradient descent based on prediction error. We show the
efficacy of RPSP-networks under partial observability on a set of robotic
control tasks from OpenAI Gym. We empirically show that RPSP-networks perform
well compared with memory-preserving networks such as GRUs, as well as finite
memory models, being the overall best performing method
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in
partially observable environments. In 2013, our large RL recurrent neural
networks (RNNs) learned from scratch to drive simulated cars from
high-dimensional video input. However, real brains are more powerful in many
ways. In particular, they learn a predictive model of their initially unknown
environment, and somehow use it for abstract (e.g., hierarchical) planning and
reasoning. Guided by algorithmic information theory, we describe RNN-based AIs
(RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending
sequences of tasks, some of them provided by the user, others invented by the
RNNAI itself in a curious, playful fashion, to improve its RNN-based world
model. Unlike our previous model-building RNN-based RL machines dating back to
1990, the RNNAI learns to actively query its model for abstract reasoning and
planning and decision making, essentially "learning to think." The basic ideas
of this report can be applied to many other cases where one RNN-like system
exploits the algorithmic information content of another. They are taken from a
grant proposal submitted in Fall 2014, and also explain concepts such as
"mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1404.782
Robot Introspection with Bayesian Nonparametric Vector Autoregressive Hidden Markov Models
Robot introspection, as opposed to anomaly detection typical in process
monitoring, helps a robot understand what it is doing at all times. A robot
should be able to identify its actions not only when failure or novelty occurs,
but also as it executes any number of sub-tasks. As robots continue their quest
of functioning in unstructured environments, it is imperative they understand
what is it that they are actually doing to render them more robust. This work
investigates the modeling ability of Bayesian nonparametric techniques on
Markov Switching Process to learn complex dynamics typical in robot contact
tasks. We study whether the Markov switching process, together with Bayesian
priors can outperform the modeling ability of its counterparts: an HMM with
Bayesian priors and without. The work was tested in a snap assembly task
characterized by high elastic forces. The task consists of an insertion subtask
with very complex dynamics. Our approach showed a stronger ability to
generalize and was able to better model the subtask with complex dynamics in a
computationally efficient way. The modeling technique is also used to learn a
growing library of robot skills, one that when integrated with low-level
control allows for robot online decision making.Comment: final version submitted to humanoids 201
Symphony from Synapses: Neocortex as a Universal Dynamical Systems Modeller using Hierarchical Temporal Memory
Reverse engineering the brain is proving difficult, perhaps impossible. While
many believe that this is just a matter of time and effort, a different
approach might help. Here, we describe a very simple idea which explains the
power of the brain as well as its structure, exploiting complex dynamics rather
than abstracting it away. Just as a Turing Machine is a Universal Digital
Computer operating in a world of symbols, we propose that the brain is a
Universal Dynamical Systems Modeller, evolved bottom-up (itself using nested
networks of interconnected, self-organised dynamical systems) to prosper in a
world of dynamical systems.
Recent progress in Applied Mathematics has produced startling evidence of
what happens when abstract Dynamical Systems interact. Key latent information
describing system A can be extracted by system B from very simple signals, and
signals can be used by one system to control and manipulate others. Using these
facts, we show how a region of the neocortex uses its dynamics to intrinsically
"compute" about the external and internal world.
Building on an existing "static" model of cortical computation (Hawkins'
Hierarchical Temporal Memory - HTM), we describe how a region of neocortex can
be viewed as a network of components which together form a Dynamical Systems
modelling module, connected via sensory and motor pathways to the external
world, and forming part of a larger dynamical network in the brain.
Empirical modelling and simulations of Dynamical HTM are possible with simple
extensions and combinations of currently existing open source software. We list
a number of relevant projects
Depth Control of Model-Free AUVs via Reinforcement Learning
In this paper, we consider depth control problems of an autonomous underwater
vehicle (AUV) for tracking the desired depth trajectories. Due to the unknown
dynamical model of the AUV, the problems cannot be solved by most of
model-based controllers. To this purpose, we formulate the depth control
problems of the AUV as continuous-state, continuous-action Markov decision
processes (MDPs) under unknown transition probabilities. Based on deterministic
policy gradient (DPG) and neural network approximation, we propose a model-free
reinforcement learning (RL) algorithm that learns a state-feedback controller
from sampled trajectories of the AUV. To improve the performance of the RL
algorithm, we further propose a batch-learning scheme through replaying
previous prioritized trajectories. We illustrate with simulations that our
model-free method is even comparable to the model-based controllers as LQI and
NMPC. Moreover, we validate the effectiveness of the proposed RL algorithm on a
seafloor data set sampled from the South China Sea
Predictive-State Decoders: Encoding the Future into Recurrent Networks
Recurrent neural networks (RNNs) are a vital modeling technique that rely on
internal states learned indirectly by optimization of a supervised,
unsupervised, or reinforcement training loss. RNNs are used to model dynamic
processes that are characterized by underlying latent states whose form is
often unknown, precluding its analytic representation inside an RNN. In the
Predictive-State Representation (PSR) literature, latent state processes are
modeled by an internal state representation that directly models the
distribution of future observations, and most recent work in this area has
relied on explicitly representing and targeting sufficient statistics of this
probability distribution. We seek to combine the advantages of RNNs and PSRs by
augmenting existing state-of-the-art recurrent neural networks with
Predictive-State Decoders (PSDs), which add supervision to the network's
internal state representation to target predicting future observations.
Predictive-State Decoders are simple to implement and easily incorporated into
existing training pipelines via additional loss regularization. We demonstrate
the effectiveness of PSDs with experimental results in three different domains:
probabilistic filtering, Imitation Learning, and Reinforcement Learning. In
each, our method improves statistical performance of state-of-the-art recurrent
baselines and does so with fewer iterations and less data.Comment: NIPS 201
Multi-Task Generative Adversarial Nets with Shared Memory for Cross-Domain Coordination Control
Generating sequential decision process from huge amounts of measured process
data is a future research direction for collaborative factory automation,
making full use of those online or offline process data to directly design
flexible make decisions policy, and evaluate performance. The key challenges
for the sequential decision process is to online generate sequential
decision-making policy directly, and transferring knowledge across tasks
domain. Most multi-task policy generating algorithms often suffer from
insufficient generating cross-task sharing structure at discrete-time nonlinear
systems with applications. This paper proposes the multi-task generative
adversarial nets with shared memory for cross-domain coordination control,
which can generate sequential decision policy directly from raw sensory input
of all of tasks, and online evaluate performance of system actions in
discrete-time nonlinear systems. Experiments have been undertaken using a
professional flexible manufacturing testbed deployed within a smart factory of
Weichai Power in China. Results on three groups of discrete-time nonlinear
control tasks show that our proposed model can availably improve the
performance of task with the help of other related tasks
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
We introduce Embed to Control (E2C), a method for model learning and control
of non-linear dynamical systems from raw pixel images. E2C consists of a deep
generative model, belonging to the family of variational autoencoders, that
learns to generate image trajectories from a latent space in which the dynamics
is constrained to be locally linear. Our model is derived directly from an
optimal control formulation in latent space, supports long-term prediction of
image sequences and exhibits strong performance on a variety of complex control
problems.Comment: Final NIPS versio
- …