Reinforcement learning (RL) is a general paradigm for studying intelligent
behaviour, with applications ranging from artificial intelligence to psychology
and economics. AIXI is a universal solution to the RL problem; it can learn any
computable environment. A technical subtlety of AIXI is that it is defined
using a mixture over semimeasures that need not sum to 1, rather than over
proper probability measures. In this work we argue that the shortfall of a
semimeasure can naturally be interpreted as the agent's estimate of the
probability of its death. We formally define death for generally intelligent
agents like AIXI, and prove a number of related theorems about their behaviour.
Notable discoveries include that agent behaviour can change radically under
positive linear transformations of the reward signal (from suicidal to
dogmatically self-preserving), and that the agent's posterior belief that it
will survive increases over time.Comment: Conference: Artificial General Intelligence (AGI) 2016 13 pages, 2
figure