8,928 research outputs found
Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach
Reinforcement learning (RL) agents have traditionally been tasked with
maximizing the value function of a Markov decision process (MDP), either in
continuous settings, with fixed discount factor , or in episodic
settings, with . While this has proven effective for specific tasks
with well-defined objectives (e.g., games), it has never been established that
fixed discounting is suitable for general purpose use (e.g., as a model of
human preferences). This paper characterizes rationality in sequential decision
making using a set of seven axioms and arrives at a form of discounting that
generalizes traditional fixed discounting. In particular, our framework admits
a state-action dependent "discount" factor that is not constrained to be less
than 1, so long as there is eventual long run discounting. Although this
broadens the range of possible preference structures in continuous settings, we
show that there exists a unique "optimizing MDP" with fixed whose
optimal value function matches the true utility of the optimal policy, and we
quantify the difference between value and utility for suboptimal policies. Our
work can be seen as providing a normative justification for (a slight
generalization of) Martha White's RL task formalism (2017) and other recent
departures from the traditional RL, and is relevant to task specification in
RL, inverse RL and preference-based RL.Comment: 8 pages + 1 page supplement. In proceedings of AAAI 2019. Slides,
poster and bibtex available at
https://silviupitis.com/#rethinking-the-discount-factor-in-reinforcement-learning-a-decision-theoretic-approac
The cost of information
We develop an axiomatic theory of information acquisition that captures the
idea of constant marginal costs in information production: the cost of
generating two independent signals is the sum of their costs, and generating a
signal with probability half costs half its original cost. Together with a
monotonicity and a continuity conditions, these axioms determine the cost of a
signal up to a vector of parameters. These parameters have a clear economic
interpretation and determine the difficulty of distinguishing states. We argue
that this cost function is a versatile modeling tool that leads to more
realistic predictions than mutual information.Comment: 52 pages, 4 figure
A unified theory of cone metric spaces and its applications to the fixed point theory
In this paper we develop a unified theory for cone metric spaces over a solid
vector space. As an application of the new theory we present full statements of
the iterated contraction principle and the Banach contraction principle in cone
metric spaces over a solid vector space.Comment: 51 page
- …