14,770 research outputs found
Temporal-Difference Networks for Dynamical Systems with Continuous Observations and Actions
Temporal-difference (TD) networks are a class of predictive state
representations that use well-established TD methods to learn models of
partially observable dynamical systems. Previous research with TD networks has
dealt only with dynamical systems with finite sets of observations and actions.
We present an algorithm for learning TD network representations of dynamical
systems with continuous observations and actions. Our results show that the
algorithm is capable of learning accurate and robust models of several noisy
continuous dynamical systems. The algorithm presented here is the first fully
incremental method for learning a predictive representation of a continuous
dynamical system.Comment: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty
in Artificial Intelligence (UAI2009
Efficient Learning and Planning with Compressed Predictive States
Predictive state representations (PSRs) offer an expressive framework for
modelling partially observable systems. By compactly representing systems as
functions of observable quantities, the PSR learning approach avoids using
local-minima prone expectation-maximization and instead employs a globally
optimal moment-based algorithm. Moreover, since PSRs do not require a
predetermined latent state structure as an input, they offer an attractive
framework for model-based reinforcement learning when agents must plan without
a priori access to a system model. Unfortunately, the expressiveness of PSRs
comes with significant computational cost, and this cost is a major factor
inhibiting the use of PSRs in applications. In order to alleviate this
shortcoming, we introduce the notion of compressed PSRs (CPSRs). The CPSR
learning approach combines recent advancements in dimensionality reduction,
incremental matrix decomposition, and compressed sensing. We show how this
approach provides a principled avenue for learning accurate approximations of
PSRs, drastically reducing the computational costs associated with learning
while also providing effective regularization. Going further, we propose a
planning framework which exploits these learned models. And we show that this
approach facilitates model-learning and planning in large complex partially
observable domains, a task that is infeasible without the principled use of
compression.Comment: 45 pages, 10 figures, submitted to the Journal of Machine Learning
Researc
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in
partially observable environments. In 2013, our large RL recurrent neural
networks (RNNs) learned from scratch to drive simulated cars from
high-dimensional video input. However, real brains are more powerful in many
ways. In particular, they learn a predictive model of their initially unknown
environment, and somehow use it for abstract (e.g., hierarchical) planning and
reasoning. Guided by algorithmic information theory, we describe RNN-based AIs
(RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending
sequences of tasks, some of them provided by the user, others invented by the
RNNAI itself in a curious, playful fashion, to improve its RNN-based world
model. Unlike our previous model-building RNN-based RL machines dating back to
1990, the RNNAI learns to actively query its model for abstract reasoning and
planning and decision making, essentially "learning to think." The basic ideas
of this report can be applied to many other cases where one RNN-like system
exploits the algorithmic information content of another. They are taken from a
grant proposal submitted in Fall 2014, and also explain concepts such as
"mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1404.782
Online Gaussian Process State-Space Model: Learning and Planning for Partially Observable Dynamical Systems
This paper proposes an online learning method of Gaussian process state-space
model (GP-SSM). GP-SSM is a probabilistic representation learning scheme that
represents unknown state transition and/or measurement models as Gaussian
processes (GPs). While the majority of prior literature on learning of GP-SSM
are focused on processing a given set of time series data, data may arrive and
accumulate sequentially over time in most dynamical systems. Storing all such
sequential data and updating the model over entire data incur large amount of
computational resources in space and time. To overcome this difficulty, we
propose a practical method, termed \textit{onlineGPSSM}, that incorporates
stochastic variational inference (VI) and online VI with novel formulation. The
proposed method mitigates the computational complexity without catastrophic
forgetting and also support adaptation to changes in a system and/or a real
environments. Furthermore, we present application of onlineGPSSM into the
reinforcement learning (RL) of partially observable dynamical systems by
integrating onlineGPSSM with Bayesian filtering and trajectory optimization
algorithms. Numerical examples are presented to demonstrate applicability of
the proposed method.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Learning to Make Predictions In Partially Observable Environments Without a Generative Model
When faced with the problem of learning a model of a high-dimensional
environment, a common approach is to limit the model to make only a restricted
set of predictions, thereby simplifying the learning problem. These partial
models may be directly useful for making decisions or may be combined together
to form a more complete, structured model. However, in partially observable
(non-Markov) environments, standard model-learning methods learn generative
models, i.e. models that provide a probability distribution over all possible
futures (such as POMDPs). It is not straightforward to restrict such models to
make only certain predictions, and doing so does not always simplify the
learning problem. In this paper we present prediction profile models:
non-generative partial models for partially observable systems that make only a
given set of predictions, and are therefore far simpler than generative models
in some cases. We formalize the problem of learning a prediction profile model
as a transformation of the original model-learning problem, and show
empirically that one can learn prediction profile models that make a small set
of important predictions even in systems that are too complex for standard
generative models
Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs
This paper is concerned with the problem of stochastic control of gene
regulatory networks (GRNs) observed indirectly through noisy measurements and
with uncertainty in the intervention inputs. The partial observability of the
gene states and uncertainty in the intervention process are accounted for by
modeling GRNs using the partially-observed Boolean dynamical system (POBDS)
signal model with noisy gene expression measurements. Obtaining the optimal
infinite-horizon control strategy for this problem is not attainable in
general, and we apply reinforcement learning and Gaussian process techniques to
find a near-optimal solution. The POBDS is first transformed to a
directly-observed Markov Decision Process in a continuous belief space, and the
Gaussian process is used for modeling the cost function over the belief and
intervention spaces. Reinforcement learning then is used to learn the cost
function from the available gene expression data. In addition, we employ
sparsification, which enables the control of large partially-observed GRNs. The
performance of the resulting algorithm is studied through a comprehensive set
of numerical experiments using synthetic gene expression data generated from a
melanoma gene regulatory network
Analog Forecasting with Dynamics-Adapted Kernels
Analog forecasting is a nonparametric technique introduced by Lorenz in 1969
which predicts the evolution of states of a dynamical system (or observables
defined on the states) by following the evolution of the sample in a historical
record of observations which most closely resembles the current initial data.
Here, we introduce a suite of forecasting methods which improve traditional
analog forecasting by combining ideas from kernel methods developed in harmonic
analysis and machine learning and state-space reconstruction for dynamical
systems. A key ingredient of our approach is to replace single-analog
forecasting with weighted ensembles of analogs constructed using local
similarity kernels. The kernels used here employ a number of dynamics-dependent
features designed to improve forecast skill, including Takens' delay-coordinate
maps (to recover information in the initial data lost through partial
observations) and a directional dependence on the dynamical vector field
generating the data. Mathematically, our approach is closely related to kernel
methods for out-of-sample extension of functions, and we discuss alternative
strategies based on the Nystr\"om method and the multiscale Laplacian pyramids
technique. We illustrate these techniques in applications to forecasting in a
low-order deterministic model for atmospheric dynamics with chaotic
metastability, and interannual-scale forecasting in the North Pacific sector of
a comprehensive climate model. We find that forecasts based on kernel-weighted
ensembles have significantly higher skill than the conventional approach
following a single analog.Comment: submitted to Nonlinearit
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) is an
important and challenging problem in artificial intelligence. Among the most
common approaches are algorithms based on gradient ascent of a score function
representing discounted return. In this paper, we examine the role of these
policy gradient and actor-critic algorithms in partially-observable multiagent
environments. We show several candidate policy update rules and relate them to
a foundation of regret minimization and multiagent learning techniques for the
one-shot and tabular cases, leading to previously unknown convergence
guarantees. We apply our method to model-free multiagent reinforcement learning
in adversarial sequential decision problems (zero-sum imperfect information
games), using RL-style function approximation. We evaluate on commonly used
benchmark Poker domains, showing performance against fixed policies and
empirical convergence to approximate Nash equilibria in self-play with rates
similar to or better than a baseline model-free algorithm for zero sum games,
without any domain-specific state space reductions.Comment: NeurIPS 201
Scalable Variational Inference for Dynamical Systems
Gradient matching is a promising tool for learning parameters and state
dynamics of ordinary differential equations. It is a grid free inference
approach, which, for fully observable systems is at times competitive with
numerical integration. However, for many real-world applications, only sparse
observations are available or even unobserved variables are included in the
model description. In these cases most gradient matching methods are difficult
to apply or simply do not provide satisfactory results. That is why, despite
the high computational cost, numerical integration is still the gold standard
in many applications. Using an existing gradient matching approach, we propose
a scalable variational inference framework which can infer states and
parameters simultaneously, offers computational speedups, improved accuracy and
works well even under model misspecifications in a partially observable system
Predictive-State Decoders: Encoding the Future into Recurrent Networks
Recurrent neural networks (RNNs) are a vital modeling technique that rely on
internal states learned indirectly by optimization of a supervised,
unsupervised, or reinforcement training loss. RNNs are used to model dynamic
processes that are characterized by underlying latent states whose form is
often unknown, precluding its analytic representation inside an RNN. In the
Predictive-State Representation (PSR) literature, latent state processes are
modeled by an internal state representation that directly models the
distribution of future observations, and most recent work in this area has
relied on explicitly representing and targeting sufficient statistics of this
probability distribution. We seek to combine the advantages of RNNs and PSRs by
augmenting existing state-of-the-art recurrent neural networks with
Predictive-State Decoders (PSDs), which add supervision to the network's
internal state representation to target predicting future observations.
Predictive-State Decoders are simple to implement and easily incorporated into
existing training pipelines via additional loss regularization. We demonstrate
the effectiveness of PSDs with experimental results in three different domains:
probabilistic filtering, Imitation Learning, and Reinforcement Learning. In
each, our method improves statistical performance of state-of-the-art recurrent
baselines and does so with fewer iterations and less data.Comment: NIPS 201
- …