Search CORE

8,770 research outputs found

Feature Reinforcement Learning: Part I: Unstructured MDPs

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2009
Field of study

General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II. The role of POMDPs is also considered there.Comment: 24 LaTeX pages, 5 diagram

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Many Roads to Synchrony: Natural Time Scales and Their Algorithms

Author: A. S. Weigend
Christopher J. Ellison
D. Lind
J. E. Hopcroft
James P. Crutchfield
John R. Mahoney
K. Wiesner
L. E. Reichl
Robert B. Ash
Ryan G. James
T. H. Cormen
T. M. Cover
Publication venue: 'American Physical Society (APS)'
Publication date: 20/12/2013
Field of study

We consider two important time scales---the Markov and cryptic orders---that monitor how an observer synchronizes to a finitary stochastic process. We show how to compute these orders exactly and that they are most efficiently calculated from the epsilon-machine, a process's minimal unifilar model. Surprisingly, though the Markov order is a basic concept from stochastic process theory, it is not a probabilistic property of a process. Rather, it is a topological property and, moreover, it is not computable from any finite-state model other than the epsilon-machine. Via an exhaustive survey, we close by demonstrating that infinite Markov and infinite cryptic orders are a dominant feature in the space of finite-memory processes. We draw out the roles played in statistical mechanical spin systems by these two complementary length scales.Comment: 17 pages, 16 figures: http://cse.ucdavis.edu/~cmg/compmech/pubs/kro.htm. Santa Fe Institute Working Paper 10-11-02

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Learning models of plant behavior for anomaly detection and condition monitoring

Author: Brown A.J.
Catterson V.M.
Fox M.
IEEE
Long D.
McArthur S.D.J.
Publication venue
Publication date: 01/06/2007
Field of study

Providing engineers and asset managers with a too] which can diagnose faults within transformers can greatly assist decision making on such issues as maintenance, performance and safety. However, the onus has always been on personnel to accurately decide how serious a problem is and how urgently maintenance is required. In dealing with the large volumes of data involved, it is possible that faults may not be noticed until serious damage has occurred. This paper proposes the integration of a newly developed anomaly detection technique with an existing diagnosis system. By learning a Hidden Markov Model of healthy transformer behavior, unexpected operation, such as when a fault develops, can be flagged for attention. Faults can then be diagnosed using the existing system and maintenance scheduled as required, all at a much earlier stage than would previously have been possible

Crossref

University of Strathclyde Institutional Repository