Ecological adaptation in the context of an actor-critic

Abstract

Biological beings are the result of an evolutionary and developmental process of adaptation to the environment they perceive and where they act. Animals and plants have successfully adapted to a large variety of environments, which supports the ideal of inspiring artificial agents after biology and ethology. This idea has been already suggested by previous studies and is extended throughout this thesis. However, the role of perception in the process of adaptation and its integration in an agent capable of acting for survival is not clear.Robotic architectures in AI proposed throughout the last decade have broadly addressed the problems of behaviour selection, namely deciding "what to do next", and of learning as the two main adaptive processes. Behaviour selection has been commonly related to theories of motivation, and learning has been bound to theories of reinforcement. However, the formulation of a general theory including both processes as particular cases of the same phenomenon is still an incomplete task. This thesis focuses again on behaviour selection and learning; however it proposes to integrate both processes by stressing the ecological relationship between the agent and its environment. If the selection of behaviour is an expression of the agent's motivations, the feedback of the environment due to behaviour execution can be viewed as part of the same process, since it also influences the agent's internal motivations and the learning processes via reinforcement. I relate this to an argument supporting the existence of a common neural substrate to compute motivation and reward, and therefore relating the elicitation of a behaviour to the perception of reward resulting from its executionAs in previous studies, behaviour selection is viewed as a competition among parallel pathways to gain control over the agent's actuators. Unlike for the previous cases, the computation of every motivation in this thesis is not anymore the result of an additive or multiplicative formula combining inner and outer stimuli. Instead, the ecological principle is proposed to constrain the combination of stimuli in a novel fashion that leads to adaptive behavioural patterns. This method aims at overcoming the intrinsic limitations of any formula, the use of which results in behavioural responses restricted to a set of specific patterns, and therefore to the set of ethological cases they can justify. External stimuli and internal physiology in the model introduced in this thesis are not combined a priori. Instead, these are viewed from the perspective of the agent as modulatory elements biasing the selection of one behaviour over another guided by the reward provided by the environment, being the selection performed by an actor-critic reinforcement learning algorithm aiming at the maximum cumulative reward.In this context, the agent's drives are the expression of the deficit or excess of internal resources and the reference of the agent to define its relationship with the environment. The schema to learn object affordances is integrated in an actor-critic reinforcement learning algorithm, which is the core of a motivation and reinforcement framework driving behaviour selection and learning. Its working principle is based on the capacity of perceiving changes in the environment via internal hormonal responses and of modifying the agent's behavioural patterns accordingly. To this end, the concept of reward is defined in the framework of the agent's internal physiology and is related to the condition of physiological stability introduced by Ashby, and supported by Dawkins and Meyer as a requirement for survival. In this light, the definition of the reward used for learning is defined in the physiological state, where the effect of interacting with the environment can be quantified in an ethologically consistent manner.The above ideas on motivation, behaviour selection, learning and perception have been made explicit in an architecture integrated in an simulated robotic platform. To demonstrate the reach of their validity, extensive simulation has been performed to address the affordance learning paradigm and the adaptation offered by the framework of the actor-critic. To this end, three different metrics have been proposed to measure the effect of external and internal perception on the learning and behaviour selection processes: the performance in terms of flexibility of adaptation, the physiological stability and the cycles of behaviour execution at every situation. In addition to this, the thesis has begun to frame the integration of behaviours of an appetitive and consummatory nature in a single schema. Finally, it also contributes to the arguments disambiguating the role of dopamine as a neurotransmitter in the Basal Ganglia

    Similar works