373 research outputs found
Optimistic Agents are Asymptotically Optimal
We use optimism to introduce generic asymptotically optimal reinforcement
learning agents. They achieve, with an arbitrary finite or compact class of
environments, asymptotically optimal behavior. Furthermore, in the finite
deterministic case we provide finite error bounds.Comment: 13 LaTeX page
On the Computability of Solomonoff Induction and Knowledge-Seeking
Solomonoff induction is held as a gold standard for learning, but it is known
to be incomputable. We quantify its incomputability by placing various flavors
of Solomonoff's prior M in the arithmetical hierarchy. We also derive
computability bounds for knowledge-seeking agents, and give a limit-computable
weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201
Extreme State Aggregation Beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with an
environment in observation-reward-action cycles without any (esp.\ MDP)
assumptions on the environment. State aggregation and more generally feature
reinforcement learning is concerned with mapping histories/raw-states to
reduced/aggregated states. The idea behind both is that the resulting reduced
process (approximately) forms a small stationary finite-state MDP, which can
then be efficiently solved or learnt. We considerably generalize existing
aggregation results by showing that even if the reduced process is not an MDP,
the (q-)value functions and (optimal) policies of an associated MDP with same
state-space size solve the original problem, as long as the solution can
approximately be represented as a function of the reduced states. This implies
an upper bound on the required state space size that holds uniformly for all RL
problems. It may also explain why RL algorithms designed for MDPs sometimes
perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem
Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities
The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other âluckyâ settings when FTL achieves sublinear, âsmallâ regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL
Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities
The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other âluckyâ settings when FTL achieves sublinear, âsmallâ regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL
Universal knowledge-seeking agents for stochastic environments
We define an optimal Bayesian knowledge-seeking agent, KL-KSA, designed for countable hypothesis classes of stochastic environments and whose goal is to gather as much information about the unknown world as possible. Although this agent works for arbitrary countable classes and priors, we focus on the especially interesting case where all stochastic computable environments are considered and the prior is based on Solomonoffâs universal prior. Among other properties, we show that KL-KSA learns the true environment in the sense that it learns to predict the consequences of actions it does not take. We show that it does not consider noise to be information and avoids taking actions leading to inescapable traps. We also present a variety of toy experiments demonstrating that KL-KSA behaves according to expectation
Bayesian reinforcement learning with exploration
We consider a general reinforcement learning problem and
show that carefully combining the Bayesian optimal policy and an exploring
policy leads to minimax sample-complexity bounds in a very general
class of (history-based) environments. We also prove lower bounds
and show that the new algorithm displays adaptive behaviour when the
environment is easier than worst-case
Irus and his jovial crew : representations of beggars in Vincent Bourne and other eighteenth-century writers of Latin verse
Alastair Fowler has written, with reference to the time of Milton, of âLatin's special role in a bilingual cultureâ, and this was still true in the early eighteenth century. The education of the elite placed great emphasis on the art of writing Latin verse and modern, as well as ancient, writers of Latin continued to be widely read. Collections of Latin verse, by individual writers such as Vincent Bourne (c. 1694â1747) or by groups such as Westminster schoolboys or bachelors of Christ Church, Oxford, could run into multiple editions, and included poems on a wide range of contemporary topics, as well as reworkings of classical themes. This paper examines a number of eighteenth-century Latin poems dealing with beggars, several of which are here translated for the first time. Particular attention is paid to the way in which the Latin poems recycled well-worn tropes about beggary which were often at variance with the experience of real-life beggars, and to how the specificities of Latin verse might heighten negative representations of beggars in a genre which, as a manifestation of elite culture, appealed to the very class which was politically and legally responsible for controlling them
Sequential Extensions of Causal and Evidential Decision Theory
Moving beyond the dualistic view in AI where agent and environment are
separated incurs new challenges for decision making, as calculation of expected
utility is no longer straightforward. The non-dualistic decision theory
literature is split between causal decision theory and evidential decision
theory. We extend these decision algorithms to the sequential setting where the
agent alternates between taking actions and observing their consequences. We
find that evidential decision theory has two natural extensions while causal
decision theory only has one.Comment: ADT 201
- âŠ