Time irreversibility in neuronal dynamics has recently been demonstrated to
correlate with various indicators of cognitive effort in living systems. Using
Landauer's principle, which posits that time-irreversible information
processing consumes energy, we establish a thermodynamically consistent measure
of cognitive energy cost associated with belief dynamics. We utilize this
concept to analyze a two-armed bandit game, a standard decision-making
framework under uncertainty, considering exploitation, finite memory, and
concurrent allocation to both game options or arms. Through exploitative,
prediction-error-based belief dynamics, the decision maker incurs a cognitive
energy cost. Initially, we observe the rise of dissipative structures in the
steady state of the belief space due to time-reversal symmetry breaking at
intermediate exploitative levels. To delve deeper into the belief dynamics, we
liken it to the behavior of an active particle subjected to state-dependent
noise. This analogy enables us to relate emergent risk aversion to standard
thermophoresis, connecting two apparently unrelated concepts. Finally, we
numerically compute the time irreversibility of belief dynamics in the steady
state, revealing a strong correlation between elevated - yet optimized -
cognitive energy cost and optimal decision-making outcomes. This correlation
suggests a mechanism for the evolution of living systems towards maximally
out-of-equilibrium structures