Search CORE

1 research outputs found

Uncertainty-based decision-making in reinforcement learning and the distributed adaptive control cognitive architecture

Author: Talavante Díaz Pablo
Publication venue
Publication date: 01/07/2021
Field of study

Treball fi de màster de: Master in Cognitive Systems and Interactive MediaDirectors: Adrián Fernández Amil, Ismael Tito FreireThis thesis explores the role of uncertainty estimation during training in Reinforce- ment Learning as a potential way of increasing sample efficiency, acting as a regu- lator between two subsystems that shape a policy: memory and stimulus-response. Memory-based subsystems are related to Episodic Reinforcement Learning, where exact snapshots or sequences of tuples generated during training are stored and then retrieved to perform the action that maximizes reward based solely on these past experiences. This way of learning is more related to how the hippocampus operates in the brain. In contrast, stimulus-response subsystems can be expressed as models that map states to actions in a model-free fashion. In humans and other animals, the dorsal striatum is responsible for making this stimulus-response mapping. However, this mapping process does not take into account the inherent uncertainty or variability of stimuli (i.e., perceptual uncertainty) in stochastic environments with partial observability and thus sometimes the optimal policy would be to rely more on the sequential feature of (model-based) memory. Several studies have shown that uncertainty plays a significant role in the decision-making process. Therefore we studied how it can arbitrate between the two systems. Concretely, we used an agent based on the Distributed Adaptive Control (DAC-ML) cognitive architecture comprising the two subsystems and an arbitration module that regulated their respective use based on the entropies of the policies. The agent was trained on a foraging task and showed dynamics that are aligned with human behaviour, where the memory-based system dominates at first, and throughout training, the stimulus-response systemslowly takes over. This research could potentially lead to more flexible and efficient Reinforcement Learning algorithms that combine different ways of learning and operating depending on the available knowledge about the environment

UPF Digital Repository