Asynchrony, overlaps and delays in sensory-motor signals introduce ambiguity
as to which stimuli, actions, and rewards are causally related. Only the
repetition of reward episodes helps distinguish true cause-effect relationships
from coincidental occurrences. In the model proposed here, a novel plasticity
rule employs short and long-term changes to evaluate hypotheses on cause-effect
relationships. Transient weights represent hypotheses that are consolidated in
long-term memory only when they consistently predict or cause future rewards.
The main objective of the model is to preserve existing network topologies when
learning with ambiguous information flows. Learning is also improved by biasing
the exploration of the stimulus-response space towards actions that in the past
occurred before rewards. The model indicates under which conditions beliefs can
be consolidated in long-term memory, it suggests a solution to the
plasticity-stability dilemma, and proposes an interpretation of the role of
short-term plasticity.Comment: Biological Cybernetics, September 201