105 research outputs found
Incentivizing Exploration with Heterogeneous Value of Money
Recently, Frazier et al. proposed a natural model for crowdsourced
exploration of different a priori unknown options: a principal is interested in
the long-term welfare of a population of agents who arrive one by one in a
multi-armed bandit setting. However, each agent is myopic, so in order to
incentivize him to explore options with better long-term prospects, the
principal must offer the agent money. Frazier et al. showed that a simple class
of policies called time-expanded are optimal in the worst case, and
characterized their budget-reward tradeoff.
The previous work assumed that all agents are equally and uniformly
susceptible to financial incentives. In reality, agents may have different
utility for money. We therefore extend the model of Frazier et al. to allow
agents that have heterogeneous and non-linear utilities for money. The
principal is informed of the agent's tradeoff via a signal that could be more
or less informative.
Our main result is to show that a convex program can be used to derive a
signal-dependent time-expanded policy which achieves the best possible
Lagrangian reward in the worst case. The worst-case guarantee is matched by
so-called "Diamonds in the Rough" instances; the proof that the guarantees
match is based on showing that two different convex programs have the same
optimal solution for these specific instances. These results also extend to the
budgeted case as in Frazier et al. We also show that the optimal policy is
monotone with respect to information, i.e., the approximation ratio of the
optimal policy improves as the signals become more informative.Comment: WINE 201
Bandit strategies in social search: the case of the DARPA red balloon challenge
Collective search for people and information has tremendously benefited from emerging communication technologies that leverage the wisdom of the crowds, and has been increasingly influential in solving time-critical tasks such as the DARPA Network Challenge (DNC, also known as the Red Balloon Challenge). However, while collective search often invests significant resources in encouraging the crowd to contribute new information, the effort invested in verifying this information is comparable, yet often neglected in crowdsourcing models. This paper studies how the exploration-verification trade-off displayed by the teams modulated their success in the DNC, as teams had limited human resources that they had to divide between recruitment (exploration) and verification (exploitation). Our analysis suggests that team performance in the DNC can be modelled as a modified multi-armed bandit (MAB) problem, where information arrives to the team originating from sources of different levels of veracity that need to be assessed in real time. We use these insights to build a data-driven agent-based model, based on the DNC’s data, to simulate team performance. The simulation results match the observed teams’ behavior and demonstrate how to achieve the best balance between exploration and exploitation for general time-critical collective search tasks.</p
Structure Learning in Human Sequential Decision-Making
Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner
Methods for specifying the target difference in a randomised controlled trial : the Difference ELicitation in TriAls (DELTA) systematic review
Peer reviewedPublisher PD
Strategies for the Use of Fallback Foods in Apes
Researchers have suggested that fallback foods (FBFs) shape primate food processing adaptations, whereas preferred foods drive harvesting adaptations, and that the dietary importance of FBFs is central in determining the expression of a variety of traits. We examine these hypotheses in extant apes. First, we compare the nature and dietary importance of FBFs used by each taxon. FBF importance appears greatest in gorillas, followed by chimpanzees and siamangs, and least in orangutans and gibbons (bonobos are difficult to place). Next, we compare 20 traits among taxa to assess whether the relative expression of traits expected for consumption of FBFs matches their observed dietary importance. Trait manifestation generally conforms to predictions based on dietary importance of FBFs. However, some departures from predictions exist, particularly for orang-utans, which express relatively more food harvesting and processing traits predicted for consuming large amounts of FBFs than expected based on observed dietary importance. This is probably due to the chemical, mechanical, and phenological properties of the apes’ main FBFs, in particular high importance of figs for chimpanzees and hylobatids, compared to use of bark and leaves—plus figs in at least some Sumatran populations—by orang-utans. This may have permitted more specialized harvesting adaptations in chimpanzees and hylobatids, and required enhanced processing adaptations in orang-utans. Possible intercontinental differences in the availability and quality of preferred and FBFs may also be important. Our analysis supports previous hypotheses suggesting a critical influence of the dietary importance and quality of FBFs on ape ecology and, consequently, evolution
- …