Search CORE

877 research outputs found

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

Author: Anandkumar Animashree
Azizzadenesheli Kamyar
Lazaric Alessandro
Publication venue
Publication date: 06/05/2017
Field of study

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through epochs, in each epoch we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the epoch, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spai

arXiv.org e-Print Archive

Caltech Authors

ToyArchitecture: Unsupervised Learning of Interpretable Models of the World

Author: Andersson Simon
Davidson Joseph
Dluhoš Petr
Feyereisl Jan
Hlubuček Petr
Hyben Martin
Nikl Matěj
Paška Přemysl
Poliak Martin
Rosa Marek
Stránský Martin
Vítků Jaroslav
Šinkora Jan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are usually uncomputable, incompatible with theories of biological intelligence, or lack practical implementations. The goal of this work is to combine the main advantages of the two: to follow a big picture view, while providing a particular theory and its implementation. In contrast with purely theoretical approaches, the resulting architecture should be usable in realistic settings, but also form the core of a framework containing all the basic mechanisms, into which it should be easier to integrate additional required functionality. In this paper, we present a novel, purposely simple, and interpretable hierarchical architecture which combines multiple different mechanisms into one system: unsupervised learning of a model of the world, learning the influence of one's own actions on the world, model-based reinforcement learning, hierarchical planning and plan execution, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations with the following properties: 1) they are increasingly more abstract, but can retain details when needed, and 2) they are easy to manipulate in their local and symbolic-like form, thus also allowing one to observe the learning process at each level of abstraction. On all levels of the system, the representation of the data can be interpreted in both a symbolic and a sub-symbolic manner. This enables the architecture to learn efficiently using sub-symbolic methods and to employ symbolic inference.Comment: Revision: changed the pdftitl

arXiv.org e-Print Archive

Directory of Open Access Journals

Echo state model of non-Markovian reinforcement learning, An

Author: Bush Keith A.
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2008
Field of study

Department Head: Dale H. Grit.2008 Spring.Includes bibliographical references (pages 137-142).There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to domains where the state-action space is approximately Markovian, a requirement for the overwhelming majority of real-world domains. These domains, termed non-Markovian reinforcement learning domains, raise a unique set of practical challenges. The reconstruction dimension required to approximate a Markovian state-space is unknown a priori and can potentially be large. Further, spatial complexity of local function approximation of the reinforcement learning domain grows exponentially with the reconstruction dimension. Parameterized dynamic systems alleviate both embedding length and state-space dimensionality concerns by reconstructing an approximate Markovian state-space via a compact, recurrent representation. Yet this representation extracts a cost; modeling reinforcement learning domains via adaptive, parameterized dynamic systems is characterized by instability, slow-convergence, and high computational or spatial training complexity. The objectives of this research are to demonstrate a stable, convergent, accurate, and scalable model of non-Markovian reinforcement learning domains. These objectives are fulfilled via fixed point analysis of the dynamics underlying the reinforcement learning domain and the Echo State Network, a class of parameterized dynamic system. Understanding models of non-Markovian reinforcement learning domains requires understanding the interactions between learning domains and their models. Fixed point analysis of the Mountain Car Problem reinforcement learning domain, for both local and nonlocal function approximations, suggests a close relationship between the locality of the approximation and the number and severity of bifurcations of the fixed point structure. This research suggests the likely cause of this relationship: reinforcement learning domains exist within a dynamic feature space in which trajectories are analogous to states. The fixed point structure maps dynamic space onto state-space. This explanation suggests two testable hypotheses. Reinforcement learning is sensitive to state-space locality because states cluster as trajectories in time rather than space. Second, models using trajectory-based features should exhibit good modeling performance and few changes in fixed point structure. Analysis of performance of lookup table, feedforward neural network, and Echo State Network (ESN) on the Mountain Car Problem reinforcement learning domain confirm these hypotheses. The ESN is a large, sparse, randomly-generated, unadapted recurrent neural network, which adapts a linear projection of the target domain onto the hidden layer. ESN modeling results on reinforcement learning domains show it achieves performance comparable to lookup table and neural network architectures on the Mountain Car Problem with minimal changes to fixed point structure. Also, the ESN achieves lookup table caliber performance when modeling Acrobot, a four-dimensional control problem, but is less successful modeling the lower dimensional Modified Mountain Car Problem. These performance discrepancies are attributed to the ESN’s excellent ability to represent complex short term dynamics, and its inability to consolidate long temporal dependencies into a static memory. Without memory consolidation, reinforcement learning domains exhibiting attractors with multiple dynamic scales are unlikely to be well-modeled via ESN. To mediate this problem, a simple ESN memory consolidation method is presented and tested for stationary dynamic systems. These results indicate the potential to improve modeling performance in reinforcement learning domains via memory consolidation

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Inferring broken detailed balance in the absence of observable currents

Author: Bisker Gili
Horowitz Jordan M.
Martínez Ignacio A.
Parrondo Juan M. R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/07/2019
Field of study

Identifying dissipation is essential for understanding the physical mechanisms underlying nonequilibrium processes. {In living systems, for example, the dissipation is directly related to the hydrolysis of fuel molecules such as adenosine triphosphate (ATP)}. Nevertheless, detecting broken time-reversal symmetry, which is the hallmark of dissipative processes, remains a challenge in the absence of observable directed motion, flows, or fluxes. Furthermore, quantifying the entropy production in a complex system requires detailed information about its dynamics and internal degrees of freedom. Here we introduce a novel approach to detect time irreversibility and estimate the entropy production from time-series measurements, even in the absence of observable currents. We apply our technique to two different physical systems, namely, a partially hidden network and a molecular motor. Our method does not require complete information about the system dynamics and thus provides a new tool for studying nonequilibrium phenomena.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

Docta Complutense

Compressibility, laws of nature, initial conditions and complexity

Author: Chibbaro Sergio
Vulpiani Angelo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We critically analyse the point of view for which laws of nature are just a mean to compress data. Discussing some basic notions of dynamical systems and information theory, we show that the idea that the analysis of large amount of data by means of an algorithm of compression is equivalent to the knowledge one can have from scientific laws, is rather naive. In particular we discuss the subtle conceptual topic of the initial conditions of phenomena which are generally incompressible. Starting from this point, we argue that laws of nature represent more than a pure compression of data, and that the availability of large amount of data, in general, is not particularly useful to understand the behaviour of complex phenomena.Comment: 19 Pages, No figures, published on Foundation of Physic

arXiv.org e-Print Archive

Crossref

PhilSci Archive

Archivio della ricerca- Università di Roma La Sapienza

Quantum State Estimation and Tracking for Superconducting Processors Using Machine Learning

Author: Lotfallahzadeh Barzili Shiva
Publication venue: Chapman University Digital Commons
Publication date: 01/12/2021
Field of study

Quantum technology has been rapidly growing; in particular, the experiments that have been performed with superconducting qubits and circuit QED have allowed us to explore the light-matter interaction at its most fundamental level. The study of coherent dynamics between two-level systems and resonator modes can provide insight into fundamental aspects of quantum physics, such as how the state of a system evolves while being continuously observed. To study such an evolving quantum system, experimenters need to verify the accuracy of state preparation and control since quantum systems are very fragile and sensitive to environmental disturbance. In this thesis, I look at these continuous monitoring and state estimation problems from a modern point of view. With the help of machine learning techniques, it has become possible to explore regimes that are not accessible with traditional methods: for example, tracking the state of a superconducting transmon qubit continuously with dynamics fast compared with the detector bandwidth. These results open up a new area of quantum state tracking, enabling us to potentially diagnose errors that occur during quantum gates. In addition, I investigate the use of supervised machine learning, in the form of a modified denoising autoencoder, to simultaneously remove experimental noise while encoding one and two-qubit quantum state estimates into a minimum number of nodes within the latent layer of a neural network. I automate the decoding of these latent representations into positive density matrices and compare them to similar estimates obtained via linear inversion and maximum likelihood estimation. Using a superconducting multiqubit chip, I experimentally verify that the neural network estimates the quantum state with greater fidelity than either traditional method. Furthermore, the network can be trained using only product states and still achieve high fidelity for entangled states. This simplification of the training overhead permits the network to aid experimental calibration, such as the diagnosis of multi-qubit crosstalk. As quantum processors increase in size and complexity, I expect automated methods such as those presented in this thesis to become increasingly attractive

Chapman University Digital Commons