Search CORE

105 research outputs found

Incentivizing Exploration with Heterogeneous Value of Money

Author: A Slivkins
HE Robbins
JC Gittins
JC Gittins
M Spence
Michael N. Katehakis
P Auer
P Whittle
S Boyd
TL Lai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/12/2015
Field of study

Recently, Frazier et al. proposed a natural model for crowdsourced exploration of different a priori unknown options: a principal is interested in the long-term welfare of a population of agents who arrive one by one in a multi-armed bandit setting. However, each agent is myopic, so in order to incentivize him to explore options with better long-term prospects, the principal must offer the agent money. Frazier et al. showed that a simple class of policies called time-expanded are optimal in the worst case, and characterized their budget-reward tradeoff. The previous work assumed that all agents are equally and uniformly susceptible to financial incentives. In reality, agents may have different utility for money. We therefore extend the model of Frazier et al. to allow agents that have heterogeneous and non-linear utilities for money. The principal is informed of the agent's tradeoff via a signal that could be more or less informative. Our main result is to show that a convex program can be used to derive a signal-dependent time-expanded policy which achieves the best possible Lagrangian reward in the worst case. The worst-case guarantee is matched by so-called "Diamonds in the Rough" instances; the proof that the guarantees match is based on showing that two different convex programs have the same optimal solution for these specific instances. These results also extend to the budgeted case as in Frazier et al. We also show that the optimal policy is monotone with respect to information, i.e., the approximation ratio of the optimal policy improves as the signals become more informative.Comment: WINE 201

arXiv.org e-Print Archive

Crossref

A Tight 2-Approximation for Preemptive Stochastic Scheduling

Author: Gittins JC
Gittins JC
Glazebrook KD
Konheim AG
Lenstra JK
Möhring RH
Nicole Megow
Pinedo ML
Pruhs KR
Tjark Vredeveld
Vredeveld T
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Controlling maximum evaluation duration in on-line and on-board evolutionary robotics

Author: AE Eiben
AE Eiben
E Haasdijk
Herbert Robbins
JC Gittins
M Sayed-Mouchaweh
N Kasabov
P Auer
Stefano Nolfi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Bandit strategies in social search: the case of the DARPA red balloon challenge

Author: A Gupta
A Rutherford
A Woolley
AD Flaxman
AK Gupta
B Awerbuch
C Boididou
D Bawden
D Shepard
E Aramaki
J Vermorel
JC Gittins
JC Tang
JR Hauser
JR Hauser
JR Smith
KR Canini
L Benkherouf
L Potts
LP Kaelbling
ML Weitzman
O Oh
P Auer
R Agrawal
R Munro
RD Luce
T Lai
Y Kryvasheyeu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Collective search for people and information has tremendously benefited from emerging communication technologies that leverage the wisdom of the crowds, and has been increasingly influential in solving time-critical tasks such as the DARPA Network Challenge (DNC, also known as the Red Balloon Challenge). However, while collective search often invests significant resources in encouraging the crowd to contribute new information, the effort invested in verifying this information is comparable, yet often neglected in crowdsourcing models. This paper studies how the exploration-verification trade-off displayed by the teams modulated their success in the DNC, as teams had limited human resources that they had to divide between recruitment (exploration) and verification (exploitation). Our analysis suggests that team performance in the DNC can be modelled as a modified multi-armed bandit (MAB) problem, where information arrives to the team originating from sources of different levels of veracity that need to be assessed in real time. We use these insights to build a data-driven agent-based model, based on the DNC’s data, to simulate team performance. The simulation results match the observed teams’ behavior and demonstrate how to achieve the best balance between exploration and exploitation for general time-critical collective search tasks.</p

Crossref

Springer - Publisher Connector

MPG.PuRe

Monash University Research Portal

Structure Learning in Human Sequential Decision-Making

Author: A Fel'dbaum
A Gelman
A Johnson
A Smith
AC Courville
AD Horowitz
AJ Yu
C Anderson
C Watkins
D Acuna
D Heckerman
DA Braun
Daniel E. Acuña
I Erev
J Anderson
J Banks
JB Tenenbaum
JB Tenenbaum
JC Gittins
JC Gittins
L Kaelbling
M Steyvers
M Steyvers
MD Lee
MJA Strens
MS Yi
N Gans
ND Daw
P Poupart
P Whittle
Paul Schrater
R Dearden
R Howard
RE Bellman
RE Bellman
RE Neapolitan
RJ Meyer
RS Sutton
SJ Gershman
TEJ Behrens
Tim Behrens
W Edwards
W Edwards
W Schultz
W Schultz
Y Brackbill
Y Sakai
Y Sakai
Publication venue: Public Library of Science
Publication date: 01/12/2010
Field of study

Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Methods for specifying the target difference in a randomised controlled trial : the Difference ELicitation in TriAls (DELTA) systematic review

Author: A Cranney
A Fredrickson
A O'Hagan
AC Modi
AF Klassen
AF Mannion
AG Copay
AH Briggs
AJ Sutton
AK Kawata
AM Horton
Andrew H. Briggs
AR Willan
AS Detsky
B Barrett
B Barrett
B Barrett
B Blumenauer
B Krakow
B Movsas
B Spiegel
BJ Fried
BL Barber
Brian Buckley
C Bombardier
Craig R. Ramsay
Cynthia Fraser
D Aletaha
D Khanna
DA Redelmeier
DA Revicki
DB Allison
DG Altman
DJ Torgerson
Douglas G. Altman
E Eberle
EA Newnham
EE Krebs
EJ Bastyr III
F Tubach
G Duru
G Pekarik
G Wells
G Wells
GC Salter
GF Funk
GP Samsa
HC Kraemer
HJ Schünemann
HM Kirkby
Ian M. Harvey
IJ Suner
J Pouchot
J Ringash
J Ringash
JC Gittins
JD Campbell
JE Gordon
JE Pope
Jenni Hislop
JJ Kragt
JM Fritz
John D. Norrie
Jonathan A. Cook
JR Thomas
K Cocks
K Wyrwich
Kirsten Harrild
KW Wyrwich
KW Wyrwich
KW Wyrwich
L Sarna
L Thabane
LA Bellm
LB Seggar
LF Bloom
LG Rider
LP Potter
LR DeRogatis
Luke D. Vale
M Aarabi
M Beninato
M Boers
M Grotle
M Mosca
MA Harris
MA Stone
MF Johnston
Michael Dewey
MK Andrew
MK Kvamme
ML Hanson
N Bellamy
N van der Hoeven
NC Santanello
NS Jacobson
NW Bowersox
P Abrams
P Asenlof
P Bacchetti
P Burgess
P Zanen
PC Kendall
Peter Fayers
PM ten Klooster
R Brant
R Fitzpatrick
R Gnat
R Howard
R Jaeschke
R Johnstone
RA Arbuckle
RA Deyo
RJ Feise
RK Wong
RV Lenth
RV Lenth
RZ Tashjian
S Chinn
S Sekhon
SD Glassman
SI Tafazal
SM Metz
SR Piva
SW Woods
T Kikuchi
Tara Gurung
Temitope E. Adewuyi
TS Bridges
U Muller
V Pepin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Oxford University Research Archive

Enlighten

NORA - Norwegian Open Research Archives

University of East Anglia digital repository

FigShare

Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments

Author: Agrawal S
Anderson E
Auer P
Chapelle O
Dani V
Davenport TH
Donahoe J
Eric M. Schwartz
Eric T. Bradlow
Filippi S
Gelman A
Gittins JC
Ortega PA
Ortega PA
Osband I
Peter S. Fader
Reiley D
Sutton RS
White JM
Whittle P
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Strategies for the Use of Fallback Foods in Apes

Author: A Deane
A Elgart-Berry
A Goodall
A Matsumoto-Oda
AA Elder
AA Nowell
AB Taylor
AB Taylor
AB Taylor
AH Harcourt
AH Harcourt
AJ Marshall
AJ Marshall
AJ Marshall
AJ Marshall
AJ Marshall
AK Basabose
Andrew J. Marshall
B Fruth
BMF Galdikas
BMF Galdikas
BMF Galdikas
BP Wheatley
BP Wheatley
C Hashimoto
CA Chapman
CA Chapman
CD Knott
CD Knott
CD Knott
CD Knott
CEG Tutin
CEG Tutin
CEG Tutin
CEG Tutin
CH Cannon
CH Cannon
CM Hladik
CP Schaik van
CP Schaik van
D Doran
D Doran
D Doran
D Doran-Sheehy
D Fossey
D Watts
D Western
D Western
DA Schmidt
DG Popovich
DH Janzen
DJ Chivers
DJ Chivers
DJ Chivers
DM Doran
DM Doran
DM Doran-Sheehy
DP Watts
DR Leighton
DW Stephens
E Delson
EA Fox
ER Vogel
ER Vogel
FJ White
G Hohmann
G Hohmann
G Yamakoshi
H Pontzer
H Pontzer
H Preuschoft
H Preuschoft
HC Morrogh-Bernard
I Singleton
J Ganas
J Ganas
J Goodall
J Raemaekers
J Sugardjito
J Yamagiwa
J Yamagiwa
J Yamagiwa
J Yamagiwa
J Yamagiwa
JA Kurland
JC Mitani
JD Pruetz
JE Lambert
JE Lambert
JEA Bertram
JEA Bertram
JG Fleagle
JGH Cant
JJ Calvert
JJ Raemaekers
JM Caton
JM Caton
JM Plavcan
JM Plavcan
JM Rothman
JR MacKinnon
JR MacKinnon
K Milton
KR McConkey
L Medway
M Clauss
M Emery Thompson
M Lathouwers De
M Leighton
M Mulavwa
M Orgeldinger
M Pickford
M Tweheyo
M Tweheyo
MA Islam
Maier
Mark E. Harrison
MC Maas
ME Harrison
ME Harrison
ME Harrison
ME Rogers
MJ Remis
MJ Remis
MJ Remis
MJ Remis
MJ Remis
MJ Remis
MJ Remis
ML Goldsmith
ML Goldsmith
MM Robbins
MM Robbins
MP Ghiglieri
MW Demment
N Badrian
N Itoh
N Kuze
NE Newton-Fisher
NL Barrickman
NL Conklin
NL Conklin-Brittain
P Andrews
P Ungar
PH Harvey
PS Rodman
PS Rodman
PS Ungar
PS Ungar
PS Ungar
R Parra
RA Palombit
RA Palombit
RF Kay
RF Kay
RF Kay
RJ Smith
RK Malenky
RK Malenky
RL Tilson
RW Wrangham
RW Wrangham
RW Wrangham
RW Wrangham
RW Wrangham
RW Wrangham
S Kuroda
S Kuroda
S Masi
SA Wich
SA Wich
SA Wich
SA Wich
SA Wich
SC Stearns
SM Cheyne
SP Gittins
SS Utami
T Breuer
T Furuichi
T Kano
T Kano
T Kano
T Nishihara
TM Smith
WJ Foley
Y Takahata
Publication venue: Springer US
Publication date: 01/01/2011
Field of study

Researchers have suggested that fallback foods (FBFs) shape primate food processing adaptations, whereas preferred foods drive harvesting adaptations, and that the dietary importance of FBFs is central in determining the expression of a variety of traits. We examine these hypotheses in extant apes. First, we compare the nature and dietary importance of FBFs used by each taxon. FBF importance appears greatest in gorillas, followed by chimpanzees and siamangs, and least in orangutans and gibbons (bonobos are difficult to place). Next, we compare 20 traits among taxa to assess whether the relative expression of traits expected for consumption of FBFs matches their observed dietary importance. Trait manifestation generally conforms to predictions based on dietary importance of FBFs. However, some departures from predictions exist, particularly for orang-utans, which express relatively more food harvesting and processing traits predicted for consuming large amounts of FBFs than expected based on observed dietary importance. This is probably due to the chemical, mechanical, and phenological properties of the apes’ main FBFs, in particular high importance of figs for chimpanzees and hylobatids, compared to use of bark and leaves—plus figs in at least some Sumatran populations—by orang-utans. This may have permitted more specialized harvesting adaptations in chimpanzees and hylobatids, and required enhanced processing adaptations in orang-utans. Possible intercontinental differences in the availability and quality of preferred and FBFs may also be important. Our analysis supports previous hypotheses suggesting a critical influence of the dietary importance and quality of FBFs on ape ecology and, consequently, evolution

Crossref

Springer - Publisher Connector

PubMed Central