307 research outputs found

    When Does Reward Maximization Lead to Matching Law?

    Get PDF
    What kind of strategies subjects follow in various behavioral circumstances has been a central issue in decision making. In particular, which behavioral strategy, maximizing or matching, is more fundamental to animal's decision behavior has been a matter of debate. Here, we prove that any algorithm to achieve the stationary condition for maximizing the average reward should lead to matching when it ignores the dependence of the expected outcome on subject's past choices. We may term this strategy of partial reward maximization “matching strategy”. Then, this strategy is applied to the case where the subject's decision system updates the information for making a decision. Such information includes subject's past actions or sensory stimuli, and the internal storage of this information is often called “state variables”. We demonstrate that the matching strategy provides an easy way to maximize reward when combined with the exploration of the state variables that correctly represent the crucial information for reward maximization. Our results reveal for the first time how a strategy to achieve matching behavior is beneficial to reward maximization, achieving a novel insight into the relationship between maximizing and matching

    Robustness of Learning That Is Based on Covariance-Driven Synaptic Plasticity

    Get PDF
    It is widely believed that learning is due, at least in part, to long-lasting modifications of the strengths of synapses in the brain. Theoretical studies have shown that a family of synaptic plasticity rules, in which synaptic changes are driven by covariance, is particularly useful for many forms of learning, including associative memory, gradient estimation, and operant conditioning. Covariance-based plasticity is inherently sensitive. Even a slight mistuning of the parameters of a covariance-based plasticity rule is likely to result in substantial changes in synaptic efficacies. Therefore, the biological relevance of covariance-based plasticity models is questionable. Here, we study the effects of mistuning parameters of the plasticity rule in a decision making model in which synaptic plasticity is driven by the covariance of reward and neural activity. An exact covariance plasticity rule yields Herrnstein's matching law. We show that although the effect of slight mistuning of the plasticity rule on the synaptic efficacies is large, the behavioral effect is small. Thus, matching behavior is robust to mistuning of the parameters of the covariance-based plasticity rule. Furthermore, the mistuned covariance rule results in undermatching, which is consistent with experimentally observed behavior. These results substantiate the hypothesis that approximate covariance-based synaptic plasticity underlies operant conditioning. However, we show that the mistuning of the mean subtraction makes behavior sensitive to the mistuning of the properties of the decision making network. Thus, there is a tradeoff between the robustness of matching behavior to changes in the plasticity rule and its robustness to changes in the properties of the decision making network

    Autonomous vehicle decision-making: Should we be bio-inspired?

    Get PDF
    © Springer International Publishing AG 2017. On our crowded roads, drivers must compete for space but cooperate to avoid occupying the same space at the same time. Decision-making is strategic and requires mutual understanding of other’s choices. Fully autonomous vehicles (AVs) will need risk management software to make these types strategic decisions without human arbitration. Accidents will occur, and what constitutes rational and ‘safe’ decisions will be scrutinized by the legal system. It is far from clear how AV-Human and AV-AV interactions should be managed. Game Theory provides a framework for analyzing mutual ‘games’ with 2 or more players. It assumes that players mutually optimize their outcomes according to Nash equilibria (NE), but do humans follow Nash equilibria in Human-Human interactions? We implemented simple two-player competitive games to see whether people played rationally according to Nash equilibria. On each of 100 trials, each player was instructed to maximise their reward by pressing one of three buttons labelled “4”, “6”, and “12”, without knowing the other players choice. If players pressed different buttons, they received a reward of 4, 6, or 12 points accordingly. If players pressed the same button, the reward was reduced depending on the game type. Results showed that players did not follow NE, but played a probabilistic game that included the “4” button, even though pressing this button is always suboptimal. We suggest that this may be an evolutionary strategy, but it clearly shows that people do not follow the ‘rational’ Nash strategy. It seems that AV-human interactions will be probabilistic. In AV-AV interactions, software may be playing itself, and may also require probabilistic optimal evolutionary-type strategies. We doubt that the full implications of autonomous decision-making have been fully worked out. Whether probabilistic decisions will tolerated legally and actuarially is doubtful. One way to avoid them would be to allow regulated AV-AV communications, and force software decisions to be deterministic according to some protocol. However, AV-Human interactions seem likely to remain problematic

    Policy Adjustment in a Dynamic Economic Game

    Get PDF
    Making sequential decisions to harvest rewards is a notoriously difficult problem. One difficulty is that the real world is not stationary and the reward expected from a contemplated action may depend in complex ways on the history of an animal's choices. Previous functional neuroimaging work combined with principled models has detected brain responses that correlate with computations thought to guide simple learning and action choice. Those works generally employed instrumental conditioning tasks with fixed action-reward contingencies. For real-world learning problems, the history of reward-harvesting choices can change the likelihood of rewards collected by the same choices in the near-term future. We used functional MRI to probe brain and behavioral responses in a continuous decision-making task where reward contingency is a function of both a subject's immediate choice and his choice history. In these more complex tasks, we demonstrated that a simple actor-critic model can account for both the subjects' behavioral and brain responses, and identified a reward prediction error signal in ventral striatal structures active during these non-stationary decision tasks. However, a sudden introduction of new reward structures engages more complex control circuitry in the prefrontal cortex (inferior frontal gyrus and anterior insula) and is not captured by a simple actor-critic model. Taken together, these results extend our knowledge of reward-learning signals into more complex, history-dependent choice tasks. They also highlight the important interplay between striatum and prefrontal cortex as decision-makers respond to the strategic demands imposed by non-stationary reward environments more reminiscent of real-world tasks

    An Integrated Decision Making Approach for Adaptive Shared Control of Mobility Assistance Robots

    Get PDF
    © 2016, Springer Science+Business Media Dordrecht. Mobility assistance robots provide support to elderly or patients during walking. The design of a safe and intuitive assistance behavior is one of the major challenges in this context. We present an integrated approach for the context-specific, on-line adaptation of the assistance level of a rollator-type mobility assistance robot by gain-scheduling of low-level robot control parameters. A human-inspired decision-making model, the drift-diffusion Model, is introduced as the key principle to gain-schedule parameters and with this to adapt the provided robot assistance in order to achieve a human-like assistive behavior. The mobility assistance robot is designed to provide (a) cognitive assistance to help the user following a desired path towards a predefined destination as well as (b) sensorial assistance to avoid collisions with obstacles while allowing for an intentional approach of them. Further, the robot observes the user long-term performance and fatigue to adapt the overall level of (c) physical assistance provided. For each type of assistance a decision-making problem is formulated that affects different low-level control parameters. The effectiveness of the proposed approach is demonstrated in technical validation experiments. Moreover, the proposed approach is evaluated in a user study with 35 elderly persons. Obtained results indicate that the proposed gain-scheduling technique incorporating ideas of human decision-making models shows a general high potential for the application in adaptive shared control of mobility assistance robots

    Labelling and Family Resemblance in the discrimination of polymorphous categories by pigeons

    Get PDF
    publication-status: Acceptedtypes: Article© 2011 Springer Verlag. This is a post print version of the article published in Animal Cognition, 2011, 14 (1), pp 21-34. The final publication is available at link.springer.comTwo experiments examined whether pigeons discriminate polymorphous categories on the basis of a single highly predictive feature or overall similarity. In the first experiment, pigeons were trained to discriminate between categories of photographs of complex real objects. Within these pictures, single features had been manipulated to produce a highly salient texture cue. Either the picture or the texture provided a reliable cue for discrimination during training, but in probe tests, the picture and texture cues were put into conflict. Some pigeons showed a significant tendency to discriminate on the basis of the picture cue (overall similarity or family resemblance), whereas others appeared to rely on the manipulated texture cue. The second experiment used artificial polymorphous categories in which one dimension of the stimulus provided a completely reliable cue to category membership, whereas three other dimensions provided cues that were individually unreliable but collectively provided a completely reliable basis for discrimination. Most pigeons came under the control of the reliable cue rather than the unreliable cues. A minority, however, came under the control of single dimensions from the unreliable set. We conclude that cue salience can be more important than cue reliability in determining what features will control behavior when multiple cues are available

    Has education lost sight of children?

    Get PDF
    The reflections presented in this chapter are informed by clinical and personal experiences of school education in the UK. There are many challenges for children and young people in the modern education system and for the professionals who support them. In the UK, there are significant gaps between the highly selective education provided to those who pay privately for it and to the majority of those educated in the state-funded system. Though literacy rates have improved around the world, many children, particularly boys, do not finish their education for reasons such as boredom, behavioural difficulties or because education does not ‘pay’. Violence, bullying, and sexual harassment are issues faced by many children in schools and there are disturbing trends of excluding children who present with behavioural problems at school whose origins are not explored. Excluded children are then educated with other children who may also have multiple problems which often just make the situation worse. The experience of clinicians suggests that school-related mental health problems are increasing in severity. Are mental health services dealing with the consequences of an education system that is not meeting children’s needs? An education system that is testing- and performance-based may not be serving many children well if it is driving important decisions about them at increasingly younger ages. Labelling of children and setting them on educational career paths can occur well before they reach secondary schools, limiting potential very early on in their developmental trajectory. Furthermore, the emphasis at school on testing may come at the expense of creativity and other forms of intelligence, which are also valuable and important. Meanwhile the employment marketplace requires people with widely different skills, with an emphasis on innovation, creativity, and problem solving. Is education losing sight of the children it is educating

    When One Hemisphere Takes Control: Metacontrol in Pigeons (Columba livia)

    Get PDF
    Vertebrate brains are composed of two hemispheres that receive input, compute, and interact to form a unified response. How the partially different processes of both hemispheres are integrated to create a single output is largely unknown. In some cases one hemisphere takes charge of the response selection--a process known as metacontrol. Thus far, this phenomenon has only been shown in a handful of studies with primates, mostly conducted in humans. Metacontrol, however, is even more relevant for animals like birds with laterally placed eyes and complete chiasmatic decussation since visual input to the hemispheres is largely different.Homing pigeons (Columba livia) were trained with a color discrimination task. Each hemisphere was trained with a different color pair and therefore had a different experience. Subsequently, the pigeons were binocularly examined with two additional stimuli that combined the positive color of one hemisphere with a negative color that had been shown to the other, omitting the availability of a coherent solution and confronting the pigeons with a conflicting situation. Some of the pigeons responded to both stimuli, indicating that none of the hemispheres dominated the overall preference. Some birds, however, responded primarily to one of the conflicting stimuli, showing that they based their choice on the left- or right-monocularly learned color pair, indicating hemispheric metacontrol.We could demonstrate for the first time that metacontrol is a widespread phenomenon that also exists in birds, and thus in principle requires no corpus callosum. Our results are closely similar to those in humans: monocular performance was higher than binocular one and animals displayed different modes of hemispheric dominance. Thus, metacontrol is a dynamic and widely distributed process that possibly constitutes a requirement for all animals with a bipartite brain to confront the problem of choosing between two hemisphere-bound behavioral options

    Pervasiveness of the IQ Rise: A Cross-Temporal Meta-Analysis

    Get PDF
    Background: Generational IQ gains in the general population (termed the Flynn effect) show an erratic pattern across different nations as well as across different domains of intelligence (fluid vs crystallized). Gains of fluid intelligence in different countries have been subject to extensive research, but less attention was directed towards gains of crystallized intelligence, probably due to evidence from the Anglo-American sphere suggesting only slight gains on this measure. In the present study, development of crystallized intelligence in the German speaking general population is assessed. Methodology/Principal Findings: To investigate whether IQ gains for crystallized intelligence are in progress in Germanspeaking countries, two independent meta-analyses were performed. By means of a cited reference search in ISI Web of Science, all studies citing test manuals and review articles of two widely-used salient measures of crystallized intelligence were obtained. Additionally, the electronic database for German academic theses was searched to identify unpublished studies employing these tests. All studies reporting participants mean IQ or raw scores of at least one of the two measures were included in the present analyses, yielding over 500 studies (.1,000 samples;.45,000 individuals). We found a significant positive association between years of test performance and intelligence (1971–2007) amounting to about 3.5 IQ points per decade. Conclusions/Significance: This study clearly demonstrates that crystallized IQ gains are substantial and of comparabl

    Behavioural Correlate of Choice Confidence in a Discrete Trial Paradigm

    Get PDF
    How animals make choices in a changing and often uncertain environment is a central theme in the behavioural sciences. There is a substantial literature on how animals make choices in various experimental paradigms but less is known about the way they assess a choice after it has been made in terms of the expected outcome. Here, we used a discrete trial paradigm to characterise how the reward history shaped the behaviour on a trial by trial basis. Rats initiated each trial which consisted of a choice between two drinking spouts that differed in their probability of delivering a sucrose solution. Critically, sucrose was delivered after a delay from the first lick at the spouts – this allowed us to characterise the behavioural profile during the window between the time of choice and its outcome. Rats' behaviour converged to optimum choice, both during the acquisition phase and after the reversal of contingencies. We monitored the post-choice behaviour at a temporal precision of 1 millisecond; lick-response profiles revealed that rats spent more time at the spout with the higher reward probability and exhibited a sparser lick pattern. This was the case when we exclusively examined the unrewarded trials, where the outcome was identical. The differential licking profiles preceded the differential choice ratios and could thus predict the changes in choice behaviour
    corecore