28 research outputs found

    Ecological adaptation in the context of an actor-critic

    Get PDF
    Biological beings are the result of an evolutionary and developmental process of adaptation to the environment they perceive and where they act. Animals and plants have successfully adapted to a large variety of environments, which supports the ideal of inspiring artificial agents after biology and ethology. This idea has been already suggested by previous studies and is extended throughout this thesis. However, the role of perception in the process of adaptation and its integration in an agent capable of acting for survival is not clear.Robotic architectures in AI proposed throughout the last decade have broadly addressed the problems of behaviour selection, namely deciding "what to do next", and of learning as the two main adaptive processes. Behaviour selection has been commonly related to theories of motivation, and learning has been bound to theories of reinforcement. However, the formulation of a general theory including both processes as particular cases of the same phenomenon is still an incomplete task. This thesis focuses again on behaviour selection and learning; however it proposes to integrate both processes by stressing the ecological relationship between the agent and its environment. If the selection of behaviour is an expression of the agent's motivations, the feedback of the environment due to behaviour execution can be viewed as part of the same process, since it also influences the agent's internal motivations and the learning processes via reinforcement. I relate this to an argument supporting the existence of a common neural substrate to compute motivation and reward, and therefore relating the elicitation of a behaviour to the perception of reward resulting from its executionAs in previous studies, behaviour selection is viewed as a competition among parallel pathways to gain control over the agent's actuators. Unlike for the previous cases, the computation of every motivation in this thesis is not anymore the result of an additive or multiplicative formula combining inner and outer stimuli. Instead, the ecological principle is proposed to constrain the combination of stimuli in a novel fashion that leads to adaptive behavioural patterns. This method aims at overcoming the intrinsic limitations of any formula, the use of which results in behavioural responses restricted to a set of specific patterns, and therefore to the set of ethological cases they can justify. External stimuli and internal physiology in the model introduced in this thesis are not combined a priori. Instead, these are viewed from the perspective of the agent as modulatory elements biasing the selection of one behaviour over another guided by the reward provided by the environment, being the selection performed by an actor-critic reinforcement learning algorithm aiming at the maximum cumulative reward.In this context, the agent's drives are the expression of the deficit or excess of internal resources and the reference of the agent to define its relationship with the environment. The schema to learn object affordances is integrated in an actor-critic reinforcement learning algorithm, which is the core of a motivation and reinforcement framework driving behaviour selection and learning. Its working principle is based on the capacity of perceiving changes in the environment via internal hormonal responses and of modifying the agent's behavioural patterns accordingly. To this end, the concept of reward is defined in the framework of the agent's internal physiology and is related to the condition of physiological stability introduced by Ashby, and supported by Dawkins and Meyer as a requirement for survival. In this light, the definition of the reward used for learning is defined in the physiological state, where the effect of interacting with the environment can be quantified in an ethologically consistent manner.The above ideas on motivation, behaviour selection, learning and perception have been made explicit in an architecture integrated in an simulated robotic platform. To demonstrate the reach of their validity, extensive simulation has been performed to address the affordance learning paradigm and the adaptation offered by the framework of the actor-critic. To this end, three different metrics have been proposed to measure the effect of external and internal perception on the learning and behaviour selection processes: the performance in terms of flexibility of adaptation, the physiological stability and the cycles of behaviour execution at every situation. In addition to this, the thesis has begun to frame the integration of behaviours of an appetitive and consummatory nature in a single schema. Finally, it also contributes to the arguments disambiguating the role of dopamine as a neurotransmitter in the Basal Ganglia

    Correction: Perceived Effort for Motor Control and Decision-Making.

    No full text
    [This corrects the article DOI: 10.1371/journal.pbio.2002885.]

    Perceived effort for motor control and decision-making - Fig 1

    No full text
    <p><b>(A)</b> Utility function traces (solid traces) resulting from trading off benefits (black solid trace) minus their associated costs (dashed-dot traces) as a function of movement time (T) for reaching movements of path distance (D) between 5 and 25 cm. The dots on the utility traces indicate the optimal time (T*) resulting from maximizing utility for that specific reaching, which increases with distance. <b>(B)</b> Effect of temporal discount (γ) on utility. Optimal movement times derived by maximizing utility are plotted as a function of distance for 3 specific temporal discount values. Movement times decrease as discount rates increase.</p

    Effect of biomechanical costs in motor control and decision-making.

    No full text
    <p><b>(A)</b> Decision-making task in which movements were aimed at the black rectangles. In each arrangement, the difference in biomechanical costs between T1 (right) and T2 (left) is maximal, although the relative path distance may vary. <b>(B)</b> Predicted versus measured average movement times for each of the 12 possible movements shown in <b>(A)</b>. The equation below is the utility function used to obtain the optimal trajectory (and movement time). <b>(C)</b> Predicted versus measured group patterns of choices for T1 as a function of relative target path distance (D1 and D2: path distances from the origin to T1 and T2, respectively) (see [<a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2002885#pbio.2002885.ref011" target="_blank">11</a>,<a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2002885#pbio.2002885.ref015" target="_blank">15</a>,<a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2002885#pbio.2002885.ref019" target="_blank">19</a>] for further detail). The predicted pattern is obtained by fitting the softmax temperature to the utilities (J) obtained for either movement and for each relative distance.</p

    Balancing out dwelling and moving: optimal sensorimotor synchronization

    No full text
    International audienceSensorimotor synchronization is a fundamental skill involved in the performance of many individual and ensemble artistic tasks (e.g., music, dance). Separate theories may explain how time intervals are estimated, and how bodily displacements toward spatial goals are controlled. However, the manner in which the nervous system produces movements toward concomitant temporal and spatial goals remains poorly understood. Previous studies have shown that typical rhythmic movements involve a motion and motionless phase (dwell). The dwell phase represents a sizeable fraction of the rhythm period, and scales with this period. The rationale for this specific organization remains unexplained, and is the object of this study. Two groups of participants (drummers, non-drummers) performed tapping movements paced at 0.5-2.5 Hz. The participants consistently organized their behavior between acoustic cues into dwell and movement phases, and movement kinematics varied with the period of the rhythm, yielding velocity profiles becoming increasingly asymmetric as the period expanded. The main new results were that the temporal variability of both the dwell and movement phases were consistent with Weber's law, and the longest phase exhibited always the smallest variability. We developed an optimal statistical model that formalized the distribution of time into dwell and movement intervals as a function of their temporal variability. The model accurately predicted the participants' dwell and movement durations irrespective of musical skill, strongly suggesting that the distribution of dwell and movement intervals results from an optimization process, dependent on each participant's skill to predict time during rest and movement

    Rapid prediction of biomechanical costs during action decisions.

    No full text
    When given a choice between actions that yield the same reward, we tend to prefer the one that requires the least effort. Recent studies have shown that humans are remarkably accurate at evaluating the effort of potential reaching actions and can predict the subtle energetic demand caused by the nonisotropic biomechanical properties of the arm. In the present study, we investigated the time course over which such information is computed and comes to influence decisions. Two independent approaches were used. First, subjects performed a reach decision task in which the time interval for deciding between two candidate reaching actions was varied from 200 to 800 ms. Second, we measured motor-evoked potential (MEPs) to single-pulse transcranial magnetic stimulation (TMS) over the primary motor cortex (M1) to probe the evolving decision at different times after stimulus presentation. Both studies yielded a consistent conclusion: that a prediction of the effort associated with candidate movements is computed very quickly and influences decisions within 200 ms after presentation of the candidate actions. Furthermore, whereas the MEPs measured 150 ms after stimulus presentation were well correlated with the choices that subjects ultimately made, later in the trial the MEP amplitudes were primarily related to the muscular requirements of the chosen movement. This suggests that corticospinal excitability (CSE) initially reflects a competition between candidate actions and later changes to reflect the processes of preparing to implement the winning action choice

    The influence of predicted arm biomechanics on decision making

    No full text

    Visual-reward driven changes of movement during action execution

    Get PDF
    Motor decision-making is often described as a sequential process, beginning with the assessment of available options and leading to the execution of a selected movement. While this view is likely to be accurate for decisions requiring significant deliberation, it would seem unfit for choices between movements in dynamic environments. In this study, we examined whether and how non-selected motor options may be considered post-movement onset. We hypothesized that a change in reward at any point in time implies a dynamic reassessment of options, even after an initial decision has been made. To test this, we performed a decision-making task in which human participants were instructed to execute a reaching movement from an origin to a rectangular target to attain a reward. Reward depended on arrival precision and on the specific distribution of reward presented along the target. On a third of trials, we changed the initial reward distribution post-movement onset. Our results indicated that participants frequently change their initially selected movements when a change is associated with an increase in reward. This process occurs quicker than overall, average reaction times. Finally, changes in movement are not only dependent on reward but also on the current state of the motor apparatus.IC was funded by the Marie Sklodowska-Curie Research Grant Scheme, Grant Number is: IF-656262. The project was also funded by the HBP SGA3 Human Brain Project Specific Grant Agreement 3 (Grant Agreement No. 945539), Funded by the EU H2020 FET Flagship program
    corecore