Search CORE

461 research outputs found

Incentivizing Exploration with Selective Data Disclosure

Author: Immorlica Nicole
Mao Jieming
Slivkins Aleksandrs
Wu Zhiwei Steven
Publication venue
Publication date: 19/02/2020
Field of study

We study the design of rating systems that incentivize (more) efficient social learning among self-interested agents. Agents arrive sequentially and are presented with a set of possible actions, each of which yields a positive reward with an unknown probability. A disclosure policy sends messages about the rewards of previously-chosen actions to arriving agents. These messages can alter agents' incentives towards exploration, taking potentially sub-optimal actions for the sake of learning more about their rewards. Prior work achieves much progress with disclosure policies that merely recommend an action to each user, but relies heavily on standard, yet very strong rationality assumptions. We study a particular class of disclosure policies that use messages, called unbiased subhistories, consisting of the actions and rewards from a subsequence of past agents. Each subsequence is chosen ahead of time, according to a predetermined partial order on the rounds. We posit a flexible model of frequentist agent response, which we argue is plausible for this class of "order-based" disclosure policies. We measure the success of a policy by its regret, i.e., the difference, over all rounds, between the expected reward of the best action and the reward induced by the policy. A disclosure policy that reveals full history in each round risks inducing herding behavior among the agents, and typically has regret linear in the time horizon

T

. Our main result is an order-based disclosure policy that obtains regret

\tilde{O}(\sqrt{T})

. This regret is known to be optimal in the worst case over reward distributions, even absent incentives. We also exhibit simpler order-based policies with higher, but still sublinear, regret. These policies can be interpreted as dividing a sublinear number of agents into constant-sized focus groups, whose histories are then revealed to future agents

arXiv.org e-Print Archive

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

Author: Chen Lijun
Liu Kai
Liu Zhiyuan
Shen Fan
Wang Huazheng
Publication venue
Publication date: 15/12/2019
Field of study

We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB,

\varepsilon

-Greedy, and Thompson Sampling. Our results show that they all achieve

\mathcal{O}(\log T)

regret and compensation under the drifted reward, and are therefore effective in incentivizing exploration. Numerical examples are provided to complement the theoretical analysis.Comment: 10 pages, 2 figures, AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Bandit Social Learning: Exploration under Myopic Behavior

Author: Banihashem Kiarash
Hajiaghayi MohammadTaghi
Shin Suho
Slivkins Aleksandrs
Publication venue
Publication date: 14/02/2023
Field of study

We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledg

arXiv.org e-Print Archive

On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach

Author: Komiyama Junpei
Noda Shunya
Publication venue
Publication date: 06/07/2021
Field of study

We analyze statistical discrimination in hiring markets using a multi-armed bandit model. Myopic firms face workers arriving with heterogeneous observable characteristics. The association between the worker's skill and characteristics is unknown ex ante; thus, firms need to learn it. Laissez-faire causes perpetual underestimation: minority workers are rarely hired, and therefore, underestimation towards them tends to persist. Even a slight population-ratio imbalance frequently produces perpetual underestimation. We propose two policy solutions: a novel subsidy rule (the hybrid mechanism) and the Rooney Rule. Our results indicate that temporary affirmative actions effectively mitigate discrimination caused by insufficient data

arXiv.org e-Print Archive

What Lies Ahead: An Exploration of Future Orientation, Self-Control, and Delinquency

Author: Clinkinbeard Samantha S.
Publication venue: DigitalCommons@UNO
Publication date: 22/08/2013
Field of study

Self-control has been consistently linked to antisocial behavior and though low self-control makes delinquency more likely, neither the findings nor the theory suggests that low self-control necessitates participation in such behavior. There remains a shortage of research on those situational factors or individual characteristics that might lessen the effects of low self-control on antisocial behavior. Future orientation is one such characteristic that can have implications for the control of behavior. The purpose of the current study was to explore the independent and interactive effects of future orientation and low self-control on delinquency using data from Wave 1 of the National Longitudinal Study of Adolescent Health. A series of regressions showed that self-control and future orientation had independent effects on delinquent behavior. Further, future-oriented achievement expectations conditioned the effect of self-control on delinquency such that the effects of self-control were weakened with increases in future orientation. The findings suggest that prevention programs should place more emphasis on helping youth plan for the future. Further, research should more fully explore the other aspects of future orientation (e.g., specificity of planning and change/stability of aspirations), as they relate to self-control and delinquency

The University of Nebraska, Omaha

The Impact of Social Impact Bond Financing

Author: Fraser Alec
Geuke Gemma
Hevenstone Debra
Hobi Lukas
Przepiorka Wojtek
Publication venue: 'Wiley'
Publication date: 01/01/2023
Field of study

Social impact bonds (SIBs), also known as Pay for Success, are an innovation in Payment by Results contracting. Investors finance programs and are repaid based on the “SIB effect,” which includes changes in outcomes attributable to financing. We generate a quantitative estimate of this part of the SIB effect for two active labor market programs in the Netherlands and Switzerland. Comparing program impacts within providers using SIB and non-SIB contracts suggests financing has positive impacts on public benefit receipt, employment, and income. Qualitative research suggests this is because SIB contracts increased pressure for all involved parties, leading to the institutionalization of selection and greater resources for SIB-financed services. Contracts with high pressure, like SIBs, may compromise both performance requirements and the potential to measure performance. We examine the implications of these findings in relation to agency and stewardship theories and highlight the significance of SIBs as multilateral as opposed to bilateral contracts

Berner Fachhochschule: ARBOR

King's Research Portal