8 research outputs found

    Optimism Based Exploration in Large-Scale Recommender Systems

    Full text link
    Bandit learning algorithms have been an increasingly popular design choice for recommender systems. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. Two of the most important bottlenecks are scaling to multi-task and A/B testing. Classic bandit algorithms, especially those leveraging contextual information, often requires reward for uncertainty estimation, which hinders their adoptions in multi-task recommender systems. Moreover, different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior induces unfair evaluation for bandit learning agents in a classic A/B test setting. In this work, we present a novel design of production bandit learning life-cycle for recommender systems, along with a novel set of metrics to measure their efficiency in user exploration. We show through large-scale production recommender system experiments and in-depth analysis that our bandit agent design improves personalization for the production recommender system and our experiment design fairly evaluates the performance of bandit learning algorithms

    Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

    Full text link
    Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics

    Assessing the Impact of U.S. Food Assistance Delivery Policies on Child Mortality in Northern Kenya

    No full text
    <div><p>The U.S. is the main country in the world that delivers its food assistance primarily via transoceanic shipments of commodity-based in-kind food. This approach is costlier and less timely than cash-based assistance, which includes cash transfers, food vouchers, and local and regional procurement, where food is bought in or nearby the recipient country. The U.S.’s approach is exacerbated by a requirement that half of its transoceanic food shipments need to be sent on U.S.-flag vessels. We estimate the effect of these U.S. food assistance distribution policies on child mortality in northern Kenya by formulating and optimizing a supply chain model. In our model, monthly orders of transoceanic shipments and cash-based interventions are chosen to minimize child mortality subject to an annual budget constraint and to policy constraints on the allowable proportions of cash-based interventions and non-US-flag shipments. By varying the restrictiveness of these policy constraints, we assess the impact of possible changes in U.S. food aid policies on child mortality. The model includes an existing regression model that uses household survey data and geospatial data to forecast the mean mid-upper-arm circumference Z scores among children in a community, and allows food assistance to increase Z scores, and Z scores to influence mortality rates. We find that cash-based interventions are a much more powerful policy lever than the U.S.-flag vessel requirement: switching to cash-based interventions reduces child mortality from 4.4% to 3.7% (a 16.2% relative reduction) in our model, whereas eliminating the U.S.-flag vessel restriction without increasing the use of cash-based interventions generates a relative reduction in child mortality of only 1.1%. The great majority of the gains achieved by cash-based interventions are due to their reduced cost, not their reduced delivery lead times; i.e., the reduction of shipping expenses allows for more food to be delivered, which reduces child mortality.</p></div

    Dependence of the annual mortality rate on the proportion of food assistance utilizing cash-based interventions (<i>l</i>) and the proportion of transoceanic shipments employing non-US-flag carriers (<i>p</i>).

    No full text
    <p>The current U.S. policy is represented by <i>l</i> = 0.65 and <i>p</i> = 0.5, the elimination of the U.S.-flag vessel requirement corresponds to <i>p</i> = 1.0, and <i>l</i> = 1.0 corresponds to the U.S. switching entirely to cash-based interventions.</p
    corecore