461 research outputs found

    Incentivizing Exploration with Selective Data Disclosure

    Full text link
    We study the design of rating systems that incentivize (more) efficient social learning among self-interested agents. Agents arrive sequentially and are presented with a set of possible actions, each of which yields a positive reward with an unknown probability. A disclosure policy sends messages about the rewards of previously-chosen actions to arriving agents. These messages can alter agents' incentives towards exploration, taking potentially sub-optimal actions for the sake of learning more about their rewards. Prior work achieves much progress with disclosure policies that merely recommend an action to each user, but relies heavily on standard, yet very strong rationality assumptions. We study a particular class of disclosure policies that use messages, called unbiased subhistories, consisting of the actions and rewards from a subsequence of past agents. Each subsequence is chosen ahead of time, according to a predetermined partial order on the rounds. We posit a flexible model of frequentist agent response, which we argue is plausible for this class of "order-based" disclosure policies. We measure the success of a policy by its regret, i.e., the difference, over all rounds, between the expected reward of the best action and the reward induced by the policy. A disclosure policy that reveals full history in each round risks inducing herding behavior among the agents, and typically has regret linear in the time horizon TT. Our main result is an order-based disclosure policy that obtains regret O~(T)\tilde{O}(\sqrt{T}). This regret is known to be optimal in the worst case over reward distributions, even absent incentives. We also exhibit simpler order-based policies with higher, but still sublinear, regret. These policies can be interpreted as dividing a sublinear number of agents into constant-sized focus groups, whose histories are then revealed to future agents

    Incentivized Exploration for Multi-Armed Bandits under Reward Drift

    Full text link
    We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of this drifted reward feedback by analyzing the performance of three instantiations of the incentivized MAB algorithm: UCB, ε\varepsilon-Greedy, and Thompson Sampling. Our results show that they all achieve O(logT)\mathcal{O}(\log T) regret and compensation under the drifted reward, and are therefore effective in incentivizing exploration. Numerical examples are provided to complement the theoretical analysis.Comment: 10 pages, 2 figures, AAAI 202

    Bandit Social Learning: Exploration under Myopic Behavior

    Full text link
    We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledg

    On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach

    Full text link
    We analyze statistical discrimination in hiring markets using a multi-armed bandit model. Myopic firms face workers arriving with heterogeneous observable characteristics. The association between the worker's skill and characteristics is unknown ex ante; thus, firms need to learn it. Laissez-faire causes perpetual underestimation: minority workers are rarely hired, and therefore, underestimation towards them tends to persist. Even a slight population-ratio imbalance frequently produces perpetual underestimation. We propose two policy solutions: a novel subsidy rule (the hybrid mechanism) and the Rooney Rule. Our results indicate that temporary affirmative actions effectively mitigate discrimination caused by insufficient data

    What Lies Ahead: An Exploration of Future Orientation, Self-Control, and Delinquency

    Get PDF
    Self-control has been consistently linked to antisocial behavior and though low self-control makes delinquency more likely, neither the findings nor the theory suggests that low self-control necessitates participation in such behavior. There remains a shortage of research on those situational factors or individual characteristics that might lessen the effects of low self-control on antisocial behavior. Future orientation is one such characteristic that can have implications for the control of behavior. The purpose of the current study was to explore the independent and interactive effects of future orientation and low self-control on delinquency using data from Wave 1 of the National Longitudinal Study of Adolescent Health. A series of regressions showed that self-control and future orientation had independent effects on delinquent behavior. Further, future-oriented achievement expectations conditioned the effect of self-control on delinquency such that the effects of self-control were weakened with increases in future orientation. The findings suggest that prevention programs should place more emphasis on helping youth plan for the future. Further, research should more fully explore the other aspects of future orientation (e.g., specificity of planning and change/stability of aspirations), as they relate to self-control and delinquency

    The Impact of Social Impact Bond Financing

    Get PDF
    Social impact bonds (SIBs), also known as Pay for Success, are an innovation in Payment by Results contracting. Investors finance programs and are repaid based on the “SIB effect,” which includes changes in outcomes attributable to financing. We generate a quantitative estimate of this part of the SIB effect for two active labor market programs in the Netherlands and Switzerland. Comparing program impacts within providers using SIB and non-SIB contracts suggests financing has positive impacts on public benefit receipt, employment, and income. Qualitative research suggests this is because SIB contracts increased pressure for all involved parties, leading to the institutionalization of selection and greater resources for SIB-financed services. Contracts with high pressure, like SIBs, may compromise both performance requirements and the potential to measure performance. We examine the implications of these findings in relation to agency and stewardship theories and highlight the significance of SIBs as multilateral as opposed to bilateral contracts
    corecore