877 research outputs found
Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
We propose a new reinforcement learning algorithm for partially observable
Markov decision processes (POMDP) based on spectral decomposition methods.
While spectral methods have been previously employed for consistent learning of
(passive) latent variable models such as hidden Markov models, POMDPs are more
challenging since the learner interacts with the environment and possibly
changes the future observations in the process. We devise a learning algorithm
running through epochs, in each epoch we employ spectral techniques to learn
the POMDP parameters from a trajectory generated by a fixed policy. At the end
of the epoch, an optimization oracle returns the optimal memoryless planning
policy which maximizes the expected reward based on the estimated POMDP model.
We prove an order-optimal regret bound with respect to the optimal memoryless
policy and efficient scaling with respect to the dimensionality of observation
and action spaces.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016),
Barcelona, Spai
ToyArchitecture: Unsupervised Learning of Interpretable Models of the World
Research in Artificial Intelligence (AI) has focused mostly on two extremes:
either on small improvements in narrow AI domains, or on universal theoretical
frameworks which are usually uncomputable, incompatible with theories of
biological intelligence, or lack practical implementations. The goal of this
work is to combine the main advantages of the two: to follow a big picture
view, while providing a particular theory and its implementation. In contrast
with purely theoretical approaches, the resulting architecture should be usable
in realistic settings, but also form the core of a framework containing all the
basic mechanisms, into which it should be easier to integrate additional
required functionality.
In this paper, we present a novel, purposely simple, and interpretable
hierarchical architecture which combines multiple different mechanisms into one
system: unsupervised learning of a model of the world, learning the influence
of one's own actions on the world, model-based reinforcement learning,
hierarchical planning and plan execution, and symbolic/sub-symbolic integration
in general. The learned model is stored in the form of hierarchical
representations with the following properties: 1) they are increasingly more
abstract, but can retain details when needed, and 2) they are easy to
manipulate in their local and symbolic-like form, thus also allowing one to
observe the learning process at each level of abstraction. On all levels of the
system, the representation of the data can be interpreted in both a symbolic
and a sub-symbolic manner. This enables the architecture to learn efficiently
using sub-symbolic methods and to employ symbolic inference.Comment: Revision: changed the pdftitl
Echo state model of non-Markovian reinforcement learning, An
Department Head: Dale H. Grit.2008 Spring.Includes bibliographical references (pages 137-142).There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to domains where the state-action space is approximately Markovian, a requirement for the overwhelming majority of real-world domains. These domains, termed non-Markovian reinforcement learning domains, raise a unique set of practical challenges. The reconstruction dimension required to approximate a Markovian state-space is unknown a priori and can potentially be large. Further, spatial complexity of local function approximation of the reinforcement learning domain grows exponentially with the reconstruction dimension. Parameterized dynamic systems alleviate both embedding length and state-space dimensionality concerns by reconstructing an approximate Markovian state-space via a compact, recurrent representation. Yet this representation extracts a cost; modeling reinforcement learning domains via adaptive, parameterized dynamic systems is characterized by instability, slow-convergence, and high computational or spatial training complexity. The objectives of this research are to demonstrate a stable, convergent, accurate, and scalable model of non-Markovian reinforcement learning domains. These objectives are fulfilled via fixed point analysis of the dynamics underlying the reinforcement learning domain and the Echo State Network, a class of parameterized dynamic system. Understanding models of non-Markovian reinforcement learning domains requires understanding the interactions between learning domains and their models. Fixed point analysis of the Mountain Car Problem reinforcement learning domain, for both local and nonlocal function approximations, suggests a close relationship between the locality of the approximation and the number and severity of bifurcations of the fixed point structure. This research suggests the likely cause of this relationship: reinforcement learning domains exist within a dynamic feature space in which trajectories are analogous to states. The fixed point structure maps dynamic space onto state-space. This explanation suggests two testable hypotheses. Reinforcement learning is sensitive to state-space locality because states cluster as trajectories in time rather than space. Second, models using trajectory-based features should exhibit good modeling performance and few changes in fixed point structure. Analysis of performance of lookup table, feedforward neural network, and Echo State Network (ESN) on the Mountain Car Problem reinforcement learning domain confirm these hypotheses. The ESN is a large, sparse, randomly-generated, unadapted recurrent neural network, which adapts a linear projection of the target domain onto the hidden layer. ESN modeling results on reinforcement learning domains show it achieves performance comparable to lookup table and neural network architectures on the Mountain Car Problem with minimal changes to fixed point structure. Also, the ESN achieves lookup table caliber performance when modeling Acrobot, a four-dimensional control problem, but is less successful modeling the lower dimensional Modified Mountain Car Problem. These performance discrepancies are attributed to the ESN’s excellent ability to represent complex short term dynamics, and its inability to consolidate long temporal dependencies into a static memory. Without memory consolidation, reinforcement learning domains exhibiting attractors with multiple dynamic scales are unlikely to be well-modeled via ESN. To mediate this problem, a simple ESN memory consolidation method is presented and tested for stationary dynamic systems. These results indicate the potential to improve modeling performance in reinforcement learning domains via memory consolidation
Inferring broken detailed balance in the absence of observable currents
Identifying dissipation is essential for understanding the physical
mechanisms underlying nonequilibrium processes. {In living systems, for
example, the dissipation is directly related to the hydrolysis of fuel
molecules such as adenosine triphosphate (ATP)}. Nevertheless, detecting broken
time-reversal symmetry, which is the hallmark of dissipative processes, remains
a challenge in the absence of observable directed motion, flows, or fluxes.
Furthermore, quantifying the entropy production in a complex system requires
detailed information about its dynamics and internal degrees of freedom. Here
we introduce a novel approach to detect time irreversibility and estimate the
entropy production from time-series measurements, even in the absence of
observable currents. We apply our technique to two different physical systems,
namely, a partially hidden network and a molecular motor. Our method does not
require complete information about the system dynamics and thus provides a new
tool for studying nonequilibrium phenomena.Comment: 14 pages, 6 figure
Compressibility, laws of nature, initial conditions and complexity
We critically analyse the point of view for which laws of nature are just a
mean to compress data. Discussing some basic notions of dynamical systems and
information theory, we show that the idea that the analysis of large amount of
data by means of an algorithm of compression is equivalent to the knowledge one
can have from scientific laws, is rather naive. In particular we discuss the
subtle conceptual topic of the initial conditions of phenomena which are
generally incompressible. Starting from this point, we argue that laws of
nature represent more than a pure compression of data, and that the
availability of large amount of data, in general, is not particularly useful to
understand the behaviour of complex phenomena.Comment: 19 Pages, No figures, published on Foundation of Physic
Quantum State Estimation and Tracking for Superconducting Processors Using Machine Learning
Quantum technology has been rapidly growing; in particular, the experiments that have been performed with superconducting qubits and circuit QED have allowed us to explore the light-matter interaction at its most fundamental level. The study of coherent dynamics between two-level systems and resonator modes can provide insight into fundamental aspects of quantum physics, such as how the state of a system evolves while being continuously observed. To study such an evolving quantum system, experimenters need to verify the accuracy of state preparation and control since quantum systems are very fragile and sensitive to environmental disturbance. In this thesis, I look at these continuous monitoring and state estimation problems from a modern point of view. With the help of machine learning techniques, it has become possible to explore regimes that are not accessible with traditional methods: for example, tracking the state of a superconducting transmon qubit continuously with dynamics fast compared with the detector bandwidth. These results open up a new area of quantum state tracking, enabling us to potentially diagnose errors that occur during quantum gates. In addition, I investigate the use of supervised machine learning, in the form of a modified denoising autoencoder, to simultaneously remove experimental noise while encoding one and two-qubit quantum state estimates into a minimum number of nodes within the latent layer of a neural network. I automate the decoding of these latent representations into positive density matrices and compare them to similar estimates obtained via linear inversion and maximum likelihood estimation. Using a superconducting multiqubit chip, I experimentally verify that the neural network estimates the quantum state with greater fidelity than either traditional method. Furthermore, the network can be trained using only product states and still achieve high fidelity for entangled states. This simplification of the training overhead permits the network to aid experimental calibration, such as the diagnosis of multi-qubit crosstalk. As quantum processors increase in size and complexity, I expect automated methods such as those presented in this thesis to become increasingly attractive
- …