Reinforcement Learning Approaches to Instrumental Contingency Degradation in Rats

Abstract

International audienceGoal directed action involves a representation of the consequences of an action. Rats with lesions of the medial prefrontal cortex do not adapt their instrumental response in a Skinner box when food delivery becomes unrelated to lever pressing. This indicates a role for the prefrontal region in adapting to contingency changes, a form of causal learning. We attempted to model this phenomenon in a reinforcement learning framework. Behavioural sequences of normal and lesioned rats were used to feed models based on the SARSA algorithm. One model (factorized-states) focused on temporal factors, representing continuous states as vectors of decaying event traces. The second model (event sequence) emphasized sequences, representing states as n-uplets of events. The values of state-action pairs were incorporated into a softmax policy to derive predicted action probabilities and adjust model parameters. Both models revealed a number of discrepancies between predicted and actual behaviour, emphasising changes in magazine visits rather that lever presses. The models also did not reproduce the differential adaptation of normal and prefrontal lesioned rats to contingency degradation. These data suggest that temporal difference learning models fail to capture causal relationships involved in the adaptation to contingency changes

    Similar works