14,311 research outputs found

    Reinstated episodic context guides sampling-based decisions for reward.

    Get PDF
    How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event. This effect is mediated by fMRI measures of context retrieval on each trial, suggesting a mechanism whereby cues trigger retrieval of context, which then triggers retrieval of other decisions from that context. This result establishes a new avenue by which experience can guide choice and, as such, has broad implications for the study of decisions

    Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems

    Full text link
    Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each "arm" representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature). Our results advance the state of art on several different topics: the theory of crowdsourcing markets, principal-agent problems, multi-armed bandits, and dynamic pricing.Comment: This is the full version of a paper in the ACM Conference on Economics and Computation (ACM-EC), 201

    The Allocation of Software Development Resources In ‘Open Source’ Production Mode

    Get PDF
    This paper aims to develop a stochastic simulation structure capable of describing the decentralized, micro-level decisions that allocate programming resources both within and among open source/free software (OS/FS) projects, and that thereby generate an array of OS/FS system products each of which possesses particular qualitative attributes. The core or behavioral kernel of simulation tool presented here represents the effects of the reputational reward structure of OS/FS communities (as characterized by Raymond 1998) to be the key mechanism governing the probabilistic allocation of agents’ individual contributions among the constituent components of an evolving software system. In this regard, our approach follows the institutional analysis approach associated with studies of academic researchers in “open science” communities. For the purposes of this first step, the focus of the analysis is confined to showing the ways in which the specific norms of the reward system and organizational rules can shape emergent properties of successive releases of code for a given project, such as its range of functions and reliability. The global performance of the OS/FS mode, in matching the functional and other characteristics of the variety of software systems that are produced with the needs of users in various sectors of the economy and polity, obviously, is a matter of considerable importance that will bear upon the long-term viability and growth of this mode of organizing production and distribution. Our larger objective, therefore, is to arrive at a parsimonious characterization of the workings of OS/FS communities engaged across a number of projects, and their collective productive performance in dimensions that are amenable to “social welfare” evaluation. Seeking that goal will pose further new and interesting problems for study, a number of which are identified in the essay’s conclusion. Yet, it is argued that that these too will be found to be tractable within the framework provided by refining and elaborating on the core (“proof of concept”) model that is presented in this paper.

    Decision-making model for adaptive impedance control of teleoperation systems

    Get PDF
    © 2008-2011 IEEE. This paper presents a haptic assistance strategy for teleoperation that makes a task and situation-specific compromise between improving tracking performance or human-machine interaction in partially structured environments via the scheduling of the parameters of an admittance controller. The proposed assistance strategy builds on decision-making models and combines one of them with impedance control techniques that are standard in bilateral teleoperation systems. Even though several decision-making models have been proposed in cognitive science, their application to assisted teleoperation and assisted robotics has hardly been explored yet. Experimental data supports the Drift-Diffusion model as a suitable scheduling strategy for haptic shared control, in which the assistance mechanism can be adapted via the parameters of reward functions. Guidelines to tune the decision making model are presented. The influence of the reward structure on the realized haptic assistances is evaluated in a user study and results are compared to the no assistance and human assistance case

    Instructional control in choice tasks: the relation between type of schedule and relative expected values

    Get PDF
    The present work aims improve our understanding of the boundaries of instructional control. It does so by solving contradictory results obtained on two different fields: Three studies conducted on the description-experience gap field, showing that instructions are neglected when personal experience is available, and several others conducted on the experimental analysis of behavior paradigm getting to the opposite conclusion. Two factors were studied: the type of schedule, and the relative expected values between options. The present work showed that (1) positive evidence of instructional control was found in a choice task with probability schedules and different expected values between options; (2) negative evidence of instructional control was found in a choice task with VI schedules and similar expected values between options; and (3) these results, together with previous research, suggest that relative expected values are a fundamental factor on understanding the presence of instructional control in choice tasks. We conclude that the relevance of this factor relies on its capacity to make participants' decisions easier: all else being equal, adding descriptions enables participants to better discriminate optimal behavior in choice tasks.- This study was conducted at Psychology University Center for Bio- logical and Agricultural Sciences, University of Guadalajara, and sup- ported by the Portuguese Foundation for Science and Technology and the Portuguese Ministery of Education and Science through national funds and when applicable co -financed by FEDER under the PT2020 Partnership Agreement (UID/PSI/01662/2013)

    Probability matching on a simple simulated foraging task:The effects of reward persistence and accumulation on choice behavior

    Get PDF
    Over a series of decisions between two or more probabilistically rewarded options, humans have a tendency to diversify their choices, even when this will lead to diminished overall reward. In the extreme case of probability matching, this tendency is expressed through allocation of choices in proportion to their likelihood of reward. Research suggests that this behaviour is an instinctive response, driven by heuristics, and that it may be overruled through the application of sufficient deliberation and self-control. However, if this is the case, then how and why did this response become established? The present study explores the hypothesis that diversification of choices, and potentially probability matching, represents an overextension of a historically normative foraging strategy. This is done through examining choice behaviour on a simple simulated foraging task, designed to model the natural process of accumulation of unharvested resources over time. Behaviour was then directly compared with that observed on a standard fixed probability task (cf. Ellerby & Tunney, 2017). Results indicated a convergence of choice patterns on the simulated foraging task, between participants who acted intuitively and those who took a more strategic approach. These findings are also compared with those of another similarly motivated study (Schulze, van Ravenzwaaij, & Newell, 2017)

    CGAMES'2009

    Get PDF
    corecore