105 research outputs found

    Incentivizing Exploration with Heterogeneous Value of Money

    Full text link
    Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent's tradeoff via a signal that could be more or less informative. Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called "Diamonds in the Rough" instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances. These results also extend to the budgeted case as in Frazier et al. We also show that the optimal policy is monotone with respect to information, i.e., the approximation ratio of the optimal policy improves as the signals become more informative.Comment: WINE 201

    A Tight 2-Approximation for Preemptive Stochastic Scheduling

    Full text link

    Bandit strategies in social search: the case of the DARPA red balloon challenge

    Get PDF
    Collective search for people and information has tremendously benefited from emerging communication technologies that leverage the wisdom of the crowds, and has been increasingly influential in solving time-critical tasks such as the DARPA Network Challenge (DNC, also known as the Red Balloon Challenge). However, while collective search often invests significant resources in encouraging the crowd to contribute new information, the effort invested in verifying this information is comparable, yet often neglected in crowdsourcing models. This paper studies how the exploration-verification trade-off displayed by the teams modulated their success in the DNC, as teams had limited human resources that they had to divide between recruitment (exploration) and verification (exploitation). Our analysis suggests that team performance in the DNC can be modelled as a modified multi-armed bandit (MAB) problem, where information arrives to the team originating from sources of different levels of veracity that need to be assessed in real time. We use these insights to build a data-driven agent-based model, based on the DNC’s data, to simulate team performance. The simulation results match the observed teams’ behavior and demonstrate how to achieve the best balance between exploration and exploitation for general time-critical collective search tasks.</p

    Structure Learning in Human Sequential Decision-Making

    Get PDF
    Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner

    Methods for specifying the target difference in a randomised controlled trial : the Difference ELicitation in TriAls (DELTA) systematic review

    Get PDF
    Peer reviewedPublisher PD

    Strategies for the Use of Fallback Foods in Apes

    Get PDF
    Researchers have suggested that fallback foods (FBFs) shape primate food processing adaptations, whereas preferred foods drive harvesting adaptations, and that the dietary importance of FBFs is central in determining the expression of a variety of traits. We examine these hypotheses in extant apes. First, we compare the nature and dietary importance of FBFs used by each taxon. FBF importance appears greatest in gorillas, followed by chimpanzees and siamangs, and least in orangutans and gibbons (bonobos are difficult to place). Next, we compare 20 traits among taxa to assess whether the relative expression of traits expected for consumption of FBFs matches their observed dietary importance. Trait manifestation generally conforms to predictions based on dietary importance of FBFs. However, some departures from predictions exist, particularly for orang-utans, which express relatively more food harvesting and processing traits predicted for consuming large amounts of FBFs than expected based on observed dietary importance. This is probably due to the chemical, mechanical, and phenological properties of the apes’ main FBFs, in particular high importance of figs for chimpanzees and hylobatids, compared to use of bark and leaves—plus figs in at least some Sumatran populations—by orang-utans. This may have permitted more specialized harvesting adaptations in chimpanzees and hylobatids, and required enhanced processing adaptations in orang-utans. Possible intercontinental differences in the availability and quality of preferred and FBFs may also be important. Our analysis supports previous hypotheses suggesting a critical influence of the dietary importance and quality of FBFs on ape ecology and, consequently, evolution

    Safety out of control: dopamine and defence

    Full text link

    Neuroinflammation and psychiatric illness

    Get PDF
    corecore