Search CORE

1,777 research outputs found

Learning Automata: a Survey

Author: K. Narendra
M. Thatcher
Publication venue
Publication date
Field of study

Learning Behaviors of Stochastic Automata and Some Applications

Author: Baba N.
Publication venue: WP-83-119
Publication date: 01/11/1983
Field of study

It is known that stochastic automata can be applied to describe the behavior of a decision maker or manager in the condition of uncertainty. This paper discusses learning behaviors of stochastic automata under unknown nonstationary multi-teacher environment. The consistency of sequential decision making procedures is proved under some mild conditions

International Institute for Applied Systems Analysis (IIASA)

Reinforcement Learning via AIXI Approximation

Author: Hutter Marcus
Ng Kee Siong
Silver David
Veness Joel
Publication venue
Publication date: 01/01/2010
Field of study

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

The Australian National University

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Task Specifications from Demonstrations

Author: Ho Mark K.
Jha Susmit
Seshia Sanjit A.
Tiwari Ashish
Vazquez-Chanlatte Marcell
Publication venue
Publication date: 01/01/2018
Field of study

Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

arXiv.org e-Print Archive

eScholarship - University of California

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing

x_t

at time

t

, given past observations

x_1...x_{t-1}

can be computed with the chain rule if the true generating distribution

\mu

of the sequences

x_1x_2x_3...

is known. If

\mu

is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture

\xi

defined as a

w_\nu

-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on

\xi

is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on

\mu

. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of

\xi

and give an Occam's razor argument that the choice

w_\nu\sim 2^{-K(\nu)}

for the weights is optimal, where

K(\nu)

is the length of the shortest program describing

\nu

. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX

The Complexity of POMDPs with Long-run Average Objectives

Author: Chatterjee Krishnendu
Saona Raimundo
Ziliotto Bruno
Publication venue
Publication date: 30/04/2019
Field of study

We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)