109 research outputs found

    Modelling crypto markets by multi-agent reinforcement learning

    Full text link
    Building on a previous foundation work (Lussange et al. 2020), this study introduces a multi-agent reinforcement learning (MARL) model simulating crypto markets, which is calibrated to the Binance's daily closing prices of 153153 cryptocurrencies that were continuously traded between 2018 and 2022. Unlike previous agent-based models (ABM) or multi-agent systems (MAS) which relied on zero-intelligence agents or single autonomous agent methodologies, our approach relies on endowing agents with reinforcement learning (RL) techniques in order to model crypto markets. This integration is designed to emulate, with a bottom-up approach to complexity inference, both individual and collective agents, ensuring robustness in the recent volatile conditions of such markets and during the COVID-19 era. A key feature of our model also lies in the fact that its autonomous agents perform asset price valuation based on two sources of information: the market prices themselves, and the approximation of the crypto assets fundamental values beyond what those market prices are. Our MAS calibration against real market data allows for an accurate emulation of crypto markets microstructure and probing key market behaviors, in both the bearish and bullish regimes of that particular time period

    Studying and improving reasoning in humans and machines

    Full text link
    In the present study, we investigate and compare reasoning in large language models (LLM) and humans using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. To do so, we presented to human participants and an array of pretrained LLMs new variants of classical cognitive experiments, and cross-compared their performances. Our results showed that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning. Notwithstanding this superficial similarity, an in-depth comparison between humans and LLMs indicated important differences with human-like reasoning, with models limitations disappearing almost entirely in more recent LLMs releases. Moreover, we show that while it is possible to devise strategies to induce better performance, humans and machines are not equally-responsive to the same prompting schemes. We conclude by discussing the epistemological implications and challenges of comparing human and machine behavior for both artificial intelligence and cognitive psychology.Comment: The paper is split in 4 parts : main text (pages 2-27), methods (pages 28-34), technical appendix (pages 35-45) and supplementary methods (pages 46-125

    Relative Value Biases in Large Language Models

    Full text link
    Studies of reinforcement learning in humans and animals have demonstrated a preference for options that yielded relatively better outcomes in the past, even when those options are associated with lower absolute reward. The present study tested whether large language models would exhibit a similar bias. We had gpt-4-1106-preview (GPT-4 Turbo) and Llama-2-70B make repeated choices between pairs of options with the goal of maximizing payoffs. A complete record of previous outcomes was included in each prompt. Both models exhibited relative value decision biases similar to those observed in humans and animals. Making relative comparisons among outcomes more explicit magnified the bias, whereas prompting the models to estimate expected outcomes caused the bias to disappear. These results have implications for the potential mechanisms that contribute to context-dependent choice in human agents

    The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning

    Get PDF
    While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner's action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator's value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator's actions directly affect the learner's value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner's behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators' choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning

    An Empirical Investigation of the Emergence of Money: Contrasting Temporal Difference and Opportunity Cost Reinforcement Learning

    Get PDF
    Money is a fundamental and ubiquitous institution in modern economies. However, the question of its emergence remains a central one for economists. The monetary search-theoretic approach studies the conditions under which commodity money emerges as a solution to override frictions inherent to inter-individual exchanges in a decentralized economy. Although among these conditions, agents' rationality is classically essential and a prerequisite to any theoretical monetary equilibrium, human subjects often fail to adopt optimal strategies in tasks implementing a search-theoretic paradigm when these strategies are speculative, i.e., involve the use of a costly medium of exchange to increase the probability of subsequent and successful trades. In the present work, we hypothesize that implementing such speculative behaviors relies on reinforcement learning instead of lifetime utility calculations, as supposed by classical economic theory. To test this hypothesis, we operationalized the Kiyotaki and Wright paradigm of money emergence in a multi-step exchange task and fitted behavioral data regarding human subjects performing this task with two reinforcement learning models. Each of them implements a distinct cognitive hypothesis regarding the weight of future or counterfactual rewards in current decisions. We found that both models outperformed theoretical predictions about subjects' behaviors regarding the implementation of speculative strategies and that the latter relies on the degree of the opportunity costs consideration in the learning process. Speculating about the marketability advantage of money thus seems to depend on mental simulations of counterfactual events that agents are performing in exchange situations

    Specific effect of a dopamine partial agonist on counterfactual learning: evidence from Gilles de la Tourette syndrome.

    Get PDF
    The dopamine partial agonist aripiprazole is increasingly used to treat pathologies for which other antipsychotics are indicated because it displays fewer side effects, such as sedation and depression-like symptoms, than other dopamine receptor antagonists. Previously, we showed that aripiprazole may protect motivational function by preserving reinforcement-related signals used to sustain reward-maximization. However, the effect of aripiprazole on more cognitive facets of human reinforcement learning, such as learning from the forgone outcomes of alternative courses of action (i.e., counterfactual learning), is unknown. To test the influence of aripiprazole on counterfactual learning, we administered a reinforcement learning task that involves both direct learning from obtained outcomes and indirect learning from forgone outcomes to two groups of Gilles de la Tourette (GTS) patients, one consisting of patients who were completely unmedicated and the other consisting of patients who were receiving aripiprazole monotherapy, and to healthy subjects. We found that whereas learning performance improved in the presence of counterfactual feedback in both healthy controls and unmedicated GTS patients, this was not the case in aripiprazole-medicated GTS patients. Our results suggest that whereas aripiprazole preserves direct learning of action-outcome associations, it may impair more complex inferential processes, such as counterfactual learning from forgone outcomes, in GTS patients treated with this medication

    Critical Roles for Anterior Insula and Dorsal Striatum in Punishment-Based Avoidance Learning

    Get PDF
    SummaryThe division of human learning systems into reward and punishment opponent modules is still a debated issue. While the implication of ventral prefrontostriatal circuits in reward-based learning is well established, the neural underpinnings of punishment-based learning remain unclear. To elucidate the causal implication of brain regions that were related to punishment learning in a previous functional neuroimaging study, we tested the effects of brain damage on behavioral performance, using the same task contrasting monetary gains and losses. Cortical and subcortical candidate regions, the anterior insula and dorsal striatum, were assessed in patients presenting brain tumor and Huntington disease, respectively. Both groups exhibited selective impairment of punishment-based learning. Computational modeling suggested complementary roles for these structures: the anterior insula might be involved in learning the negative value of loss-predicting cues, whereas the dorsal striatum might be involved in choosing between those cues so as to avoid the worst

    Enhanced habit formation in Gilles de la Tourette syndrome.

    Get PDF
    Tics are sometimes described as voluntary movements performed in an automatic or habitual way. Here, we addressed the question of balance between goal-directed and habitual behavioural control in Gilles de la Tourette syndrome and formally tested the hypothesis of enhanced habit formation in these patients. To this aim, we administered a three-stage instrumental learning paradigm to 17 unmedicated and 17 antipsychotic-medicated patients with Gilles de la Tourette syndrome and matched controls. In the first stage of the task, participants learned stimulus-response-outcome associations. The subsequent outcome devaluation and 'slip-of-action' tests allowed evaluation of the participants' capacity to flexibly adjust their behaviour to changes in action outcome value. In this task, unmedicated patients relied predominantly on habitual, outcome-insensitive behavioural control. Moreover, in these patients, the engagement in habitual responses correlated with more severe tics. Medicated patients performed at an intermediate level between unmedicated patients and controls. Using diffusion tensor imaging on a subset of patients, we also addressed whether the engagement in habitual responding was related to structural connectivity within cortico-striatal networks. We showed that engagement in habitual behaviour in patients with Gilles de la Tourette syndrome correlated with greater structural connectivity within the right motor cortico-striatal network. In unmedicated patients, stronger structural connectivity of the supplementary motor cortex with the sensorimotor putamen predicted more severe tics. Overall, our results indicate enhanced habit formation in unmedicated patients with Gilles de la Tourette syndrome. Aberrant reinforcement signals to the sensorimotor striatum may be fundamental for the formation of stimulus-response associations and may contribute to the habitual behaviour and tics of this syndrome.The study received the support from Association Française du Syndrome de Gilles de la Tourette. S.P. is supported by Marie Sklodowska-Curie Individual European Fellowship (PIEF-GA-2012 Grant 328822). CĂ©cile Delorme received a research grant from Agence RĂ©gionale de SantĂ© d’Ile de France

    Choice-confirmation bias and gradual perseveration in human reinforcement learning

    No full text
    Do we preferentially learn from outcomes that confirm our choices? This is one of the most basic, and yet consequence-bearing, questions concerning reinforcement learning. In recent years, we investigated this question in a series of studies implementing increasingly complex behavioral protocols. The learning rates fitted in experiments featuring partial or complete feedback, as well as free and forced choices, were systematically found to be consistent with a choice-confirmation bias. This result is robust across a broad range of outcome contingencies and response modalities. One of the prominent behavioral consequences of the confirmatory learning rate pattern is choice hysteresis: that is the tendency of repeating previous choices, despite contradictory evidence. As robust and replicable as they have proven to be, these findings were (legitimately) challenged by a couple of studies pointing out that a choice-confirmatory pattern of learning rates may spuriously arise from not taking into consideration an explicit choice autocorrelation term in the model. In the present study, we re-analyze data from four previously published papers (in total nine experiments; N=363), originally included in the studies demonstrating (or criticizing) the choice-confirmation bias in human participants. We fitted two models: one featured valence-specific updates (i.e., different learning rates for confirmatory and disconfirmatory outcomes) and one additionally including an explicit choice autocorrelation process (gradual perseveration). Our analysis confirms that the inclusion of the gradual perseveration process in the model significantly reduces the estimated choice-confirmation bias. However, in all considered experiments, the choice-confirmation bias remains present at the meta-analytical level, and significantly different from zero in most experiments. Our results demonstrate that the choice-confirmation bias resists the inclusion of an explicit choice autocorrelation term, thus proving to be a robust feature of human reinforcement learning. We conclude by discussing the psychological plausibility of the gradual perseveration process in the context of these behavioral paradigms and by pointing to additional computational processes that may play an important role in estimating and interpreting the computational biases under scrutiny

    How to prepare a rebuttal letter: Some advice from a scientist, reviewer and editor

    No full text
    The goal of the present piece is to provide some experience-based advice on how to write an optimal (or effective) rebuttal letter. After many years as a reviewer and some years as an editor, I realized that great diversity exists in the way rebuttal letters are organized, but also that not all formats are equally effective. This is why I thought that the community at large could benefit from sharing my thoughts
    • 

    corecore