8 research outputs found
Optimism Based Exploration in Large-Scale Recommender Systems
Bandit learning algorithms have been an increasingly popular design choice
for recommender systems. Despite the strong interest in bandit learning from
the community, there remains multiple bottlenecks that prevent many bandit
learning approaches from productionalization. Two of the most important
bottlenecks are scaling to multi-task and A/B testing. Classic bandit
algorithms, especially those leveraging contextual information, often requires
reward for uncertainty estimation, which hinders their adoptions in multi-task
recommender systems. Moreover, different from supervised learning algorithms,
bandit learning algorithms emphasize greatly on the data collection process
through their explorative nature. Such explorative behavior induces unfair
evaluation for bandit learning agents in a classic A/B test setting. In this
work, we present a novel design of production bandit learning life-cycle for
recommender systems, along with a novel set of metrics to measure their
efficiency in user exploration. We show through large-scale production
recommender system experiments and in-depth analysis that our bandit agent
design improves personalization for the production recommender system and our
experiment design fairly evaluates the performance of bandit learning
algorithms
Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Auction-based recommender systems are prevalent in online advertising
platforms, but they are typically optimized to allocate recommendation slots
based on immediate expected return metrics, neglecting the downstream effects
of recommendations on user behavior. In this study, we employ reinforcement
learning to optimize for long-term return metrics in an auction-based
recommender system. Utilizing temporal difference learning, a fundamental
reinforcement learning algorithm, we implement an one-step policy improvement
approach that biases the system towards recommendations with higher long-term
user engagement metrics. This optimizes value over long horizons while
maintaining compatibility with the auction framework. Our approach is grounded
in dynamic programming ideas which show that our method provably improves upon
the existing auction-based base policy. Through an online A/B test conducted on
an auction-based recommender system which handles billions of impressions and
users daily, we empirically establish that our proposed method outperforms the
current production system in terms of long-term user engagement metrics
Assessing the Impact of U.S. Food Assistance Delivery Policies on Child Mortality in Northern Kenya
<div><p>The U.S. is the main country in the world that delivers its food assistance primarily via transoceanic shipments of commodity-based in-kind food. This approach is costlier and less timely than cash-based assistance, which includes cash transfers, food vouchers, and local and regional procurement, where food is bought in or nearby the recipient country. The U.S.’s approach is exacerbated by a requirement that half of its transoceanic food shipments need to be sent on U.S.-flag vessels. We estimate the effect of these U.S. food assistance distribution policies on child mortality in northern Kenya by formulating and optimizing a supply chain model. In our model, monthly orders of transoceanic shipments and cash-based interventions are chosen to minimize child mortality subject to an annual budget constraint and to policy constraints on the allowable proportions of cash-based interventions and non-US-flag shipments. By varying the restrictiveness of these policy constraints, we assess the impact of possible changes in U.S. food aid policies on child mortality. The model includes an existing regression model that uses household survey data and geospatial data to forecast the mean mid-upper-arm circumference Z scores among children in a community, and allows food assistance to increase Z scores, and Z scores to influence mortality rates. We find that cash-based interventions are a much more powerful policy lever than the U.S.-flag vessel requirement: switching to cash-based interventions reduces child mortality from 4.4% to 3.7% (a 16.2% relative reduction) in our model, whereas eliminating the U.S.-flag vessel restriction without increasing the use of cash-based interventions generates a relative reduction in child mortality of only 1.1%. The great majority of the gains achieved by cash-based interventions are due to their reduced cost, not their reduced delivery lead times; i.e., the reduction of shipping expenses allows for more food to be delivered, which reduces child mortality.</p></div
The interrelationships among the components of the model.
<p>The interrelationships among the components of the model.</p
Dependence of the annual mortality rate on the proportion of food assistance utilizing cash-based interventions (<i>l</i>) and the proportion of transoceanic shipments employing non-US-flag carriers (<i>p</i>).
<p>The current U.S. policy is represented by <i>l</i> = 0.65 and <i>p</i> = 0.5, the elimination of the U.S.-flag vessel requirement corresponds to <i>p</i> = 1.0, and <i>l</i> = 1.0 corresponds to the U.S. switching entirely to cash-based interventions.</p