259 research outputs found
Dyna Planning using a Feature Based Generative Model
Dyna-style reinforcement learning is a powerful approach for problems where
not much real data is available. The main idea is to supplement real
trajectories, or sequences of sampled states over time, with simulated ones
sampled from a learned model of the environment. However, in large state
spaces, the problem of learning a good generative model of the environment has
been open so far. We propose to use deep belief networks to learn an
environment model for use in Dyna. We present our approach and validate it
empirically on problems where the state observations consist of images. Our
results demonstrate that using deep belief networks, which are full generative
models, significantly outperforms the use of linear expectation models,
proposed in Sutton et al. (2008)Comment: 8 pages, 7 figure
Improved Estimation in Time Varying Models
Locally adapted parameterizations of a model (such as locally weighted
regression) are expressive but often suffer from high variance. We describe an
approach for reducing the variance, based on the idea of estimating
simultaneously a transformed space for the model, as well as locally adapted
parameterizations in this new space. We present a new problem formulation that
captures this idea and illustrate it in the important context of time varying
models. We develop an algorithm for learning a set of bases for approximating a
time varying sparse network; each learned basis constitutes an archetypal
sparse network structure. We also provide an extension for learning task-driven
bases. We present empirical results on synthetic data sets, as well as on a BCI
EEG classification task.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Variational Generative Stochastic Networks with Collaborative Shaping
We develop an approach to training generative models based on unrolling a
variational auto-encoder into a Markov chain, and shaping the chain's
trajectories using a technique inspired by recent work in Approximate Bayesian
computation. We show that the global minimizer of the resulting objective is
achieved when the generative model reproduces the target distribution. To allow
finer control over the behavior of the models, we add a regularization term
inspired by techniques used for regularizing certain types of policy search in
reinforcement learning. We present empirical results on the MNIST and TFD
datasets which show that our approach offers state-of-the-art performance, both
quantitatively and from a qualitative point of view.Comment: Old paper, from ICML 201
Attend Before you Act: Leveraging human visual attention for continual learning
When humans perform a task, such as playing a game, they selectively pay
attention to certain parts of the visual input, gathering relevant information
and sequentially combining it to build a representation from the sensory data.
In this work, we explore leveraging where humans look in an image as an
implicit indication of what is salient for decision making. We build on top of
the UNREAL architecture in DeepMind Lab's 3D navigation maze environment. We
train the agent both with original images and foveated images, which were
generated by overlaying the original images with saliency maps generated using
a real-time spectral residual technique. We investigate the effectiveness of
this approach in transfer learning by measuring performance in the context of
noise in the environment.Comment: Lifelong Learning: A Reinforcement Learning Approach (LLARLA)
Workshop, ICML 201
Data Generation as Sequential Decision Making
We connect a broad class of generative models through their shared reliance
on sequential decision making. Motivated by this view, we develop extensions to
an existing model, and then explore the idea further in the context of data
imputation -- perhaps the simplest setting in which to investigate the relation
between unconditional and conditional generative modelling. We formulate data
imputation as an MDP and develop models capable of representing effective
policies for it. We construct the models using neural networks and train them
using a form of guided policy search. Our models generate predictions through
an iterative process of feedback and refinement. We show that this approach can
learn effective policies for imputation problems of varying difficulty and
across multiple datasets.Comment: Accepted for publication at Advances in Neural Information Processing
Systems (NIPS) 201
Learning with Pseudo-Ensembles
We formalize the notion of a pseudo-ensemble, a (possibly infinite)
collection of child models spawned from a parent model by perturbing it
according to some noise process. E.g., dropout (Hinton et. al, 2012) in a deep
neural network trains a pseudo-ensemble of child subnetworks generated by
randomly masking nodes in the parent network. We present a novel regularizer
based on making the behavior of a pseudo-ensemble robust with respect to the
noise process generating it. In the fully-supervised setting, our regularizer
matches the performance of dropout. But, unlike dropout, our regularizer
naturally extends to the semi-supervised setting, where it produces
state-of-the-art results. We provide a case study in which we transform the
Recursive Neural Tensor Network of (Socher et. al, 2013) into a
pseudo-ensemble, which significantly improves its performance on a real-world
sentiment analysis benchmark.Comment: To appear in Advances in Neural Information Processing Systems 27
(NIPS 2014), Advances in Neural Information Processing Systems 27, Dec. 201
Metrics for Finite Markov Decision Processes
We present metrics for measuring the similarity of states in a finite Markov
decision process (MDP). The formulation of our metrics is based on the notion
of bisimulation for MDPs, with an aim towards solving discounted infinite
horizon reinforcement learning tasks. Such metrics can be used to aggregate
states, as well as to better structure other value function approximators
(e.g., memory-based or nearest-neighbor approximators). We provide bounds that
relate our metric distances to the optimal values of states in the given MDP.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in
Artificial Intelligence (UAI2004
A Canonical Form for Weighted Automata and Applications to Approximate Minimization
We study the problem of constructing approximations to a weighted automaton.
Weighted finite automata (WFA) are closely related to the theory of rational
series. A rational series is a function from strings to real numbers that can
be computed by a finite WFA. Among others, this includes probability
distributions generated by hidden Markov models and probabilistic automata. The
relationship between rational series and WFA is analogous to the relationship
between regular languages and ordinary automata. Associated with such rational
series are infinite matrices called Hankel matrices which play a fundamental
role in the theory of minimal WFA. Our contributions are: (1) an effective
procedure for computing the singular value decomposition (SVD) of such infinite
Hankel matrices based on their representation in terms of finite WFA; (2) a new
canonical form for finite WFA based on this SVD decomposition; and, (3) an
algorithm to construct approximate minimizations of a given WFA. The goal of
our approximate minimization algorithm is to start from a minimal WFA and
produce a smaller WFA that is close to the given one in a certain sense. The
desired size of the approximating automaton is given as input. We give bounds
describing how well the approximation emulates the behavior of the original
WFA
Neural Network Based Nonlinear Weighted Finite Automata
Weighted finite automata (WFA) can expressively model functions defined over
strings but are inherently linear models. Given the recent successes of
nonlinear models in machine learning, it is natural to wonder whether
ex-tending WFA to the nonlinear setting would be beneficial. In this paper, we
propose a novel model of neural network based nonlinearWFA model (NL-WFA) along
with a learning algorithm. Our learning algorithm is inspired by the spectral
learning algorithm for WFAand relies on a nonlinear decomposition of the
so-called Hankel matrix, by means of an auto-encoder network. The expressive
power of NL-WFA and the proposed learning algorithm are assessed on both
synthetic and real-world data, showing that NL-WFA can lead to smaller model
sizes and infer complex grammatical structures from data.Comment: AISTATS 201
Singular value automata and approximate minimization
The present paper uses spectral theory of linear operators to construct
approximately minimal realizations of weighted languages. Our new contributions
are: (i) a new algorithm for the SVD decomposition of infinite Hankel matrices
based on their representation in terms of weighted automata, (ii) a new
canonical form for weighted automata arising from the SVD of its corresponding
Hankel matrix and (iii) an algorithm to construct approximate minimizations of
given weighted automata by truncating the canonical form. We give bounds on the
quality of our approximation
- …