We introduce a physiological model-based agent as proof-of-principle that it
is possible to define a flexible self-preserving system that does not use a
reward signal or reward-maximization as an objective. We achieve this by
introducing the Self-Preserving Agent (SPA) with a physiological structure
where the system can get trapped in an absorbing state if the agent does not
solve and execute goal-directed polices. Our agent is defined using new class
of Bellman equations called Operator Bellman Equations (OBEs), for encoding
jointly non-stationary non-Markovian tasks formalized as a Temporal Goal Markov
Decision Process (TGMDP). OBEs produce optimal goal-conditioned spatiotemporal
transition operators that map an initial state-time to the final state-times of
a policy used to complete a goal, and can also be used to forecast future
states in multiple dynamic physiological state-spaces. SPA is equipped with an
intrinsic motivation function called the valence function, which quantifies the
changes in empowerment (the channel capacity of a transition operator) after
following a policy. Because empowerment is a function of a transition operator,
there is a natural synergism between empowerment and OBEs: the OBEs create
hierarchical transition operators, and the valence function can evaluate
hierarchical empowerment change defined on these operators. The valence
function can then be used for goal selection, wherein the agent chooses a
policy sequence that realizes goal states which produce maximum empowerment
gain. In doing so, the agent will seek freedom and avoid internal death-states
that undermine its ability to control both external and internal states in the
future, thereby exhibiting the capacity of predictive and anticipatory
self-preservation. We also compare SPA to Multi-objective RL, and discuss its
capacity for symbolic reasoning and life-long learning.Comment: 54 page