141 research outputs found
Kernel Estimation and Model Combination in a Bandit Problem with Covariates
Multi-armed bandit problem is an important optimization game that requires an exploration-exploitation tradeoff to achieve optimal total reward. Motivated from industrial applications such as online advertising and clinical research, we consider a setting where the rewards of bandit machines are associated with covariates, and the accurate estimation of the corresponding mean reward functions plays an important role in the performance of allocation rules. Under a flexible problem setup, we establish asymptotic strong consistency and perform a finite-time regret analysis for a sequential randomized allocation strategy based on kernel estimation. In addition, since many nonparametric and parametric methods in supervised learning may be applied to estimating the mean reward functions but guidance on how to choose among them is generally unavailable, we propose a model combining allocation strategy for adaptive performance. Simulations and a real data evaluation are conducted to illustrate the performance of the proposed allocation strategy
Optimal treatment allocations in space and time for on-line control of an emerging infectious disease
A key component in controlling the spread of an epidemic is deciding where, whenand to whom to apply an intervention.We develop a framework for using data to informthese decisionsin realtime.We formalize a treatment allocation strategy as a sequence of functions, oneper treatment period, that map up-to-date information on the spread of an infectious diseaseto a subset of locations where treatment should be allocated. An optimal allocation strategyoptimizes some cumulative outcome, e.g. the number of uninfected locations, the geographicfootprint of the disease or the cost of the epidemic. Estimation of an optimal allocation strategyfor an emerging infectious disease is challenging because spatial proximity induces interferencebetween locations, the number of possible allocations is exponential in the number oflocations, and because disease dynamics and intervention effectiveness are unknown at outbreak.We derive a Bayesian on-line estimator of the optimal allocation strategy that combinessimulation–optimization with Thompson sampling.The estimator proposed performs favourablyin simulation experiments. This work is motivated by and illustrated using data on the spread ofwhite nose syndrome, which is a highly fatal infectious disease devastating bat populations inNorth America
Modeling item--item similarities for personalized recommendations on Yahoo! front page
We consider the problem of algorithmically recommending items to users on a
Yahoo! front page module. Our approach is based on a novel multilevel
hierarchical model that we refer to as a User Profile Model with Graphical
Lasso (UPG). The UPG provides a personalized recommendation to users by
simultaneously incorporating both user covariates and historical user
interactions with items in a model based way. In fact, we build a per-item
regression model based on a rich set of user covariates and estimate individual
user affinity to items by introducing a latent random vector for each user. The
vector random effects are assumed to be drawn from a prior with a precision
matrix that measures residual partial associations among items. To ensure
better estimates of a precision matrix in high-dimensions, the matrix elements
are constrained through a Lasso penalty. Our model is fitted through a
penalized-quasi likelihood procedure coupled with a scalable EM algorithm. We
employ several computational strategies like multi-threading, conjugate
gradients and heavily exploit problem structure to scale our computations in
the E-step. For the M-step we take recourse to a scalable variant of the
Graphical Lasso algorithm for covariance selection. Through extensive
experiments on a new data set obtained from Yahoo! front page and a benchmark
data set from a movie recommender application, we show that our UPG model
significantly improves performance compared to several state-of-the-art methods
in the literature, especially those based on a bilinear random effects model
(BIRE). In particular, we show that the gains of UPG are significant compared
to BIRE when the number of users is large and the number of items to select
from is small. For large item sets and relatively small user sets the results
of UPG and BIRE are comparable. The UPG leads to faster model building and
produces outputs which are interpretable.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS475 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Contextual-Bandit Approach to Personalized News Article Recommendation
Personalized web services strive to adapt their services (advertisements,
news articles, etc) to individual users by making use of both content and user
information. Despite a few recent advances, this problem remains challenging
for at least two reasons. First, web service is featured with dynamically
changing pools of content, rendering traditional collaborative filtering
methods inapplicable. Second, the scale of most web services of practical
interest calls for solutions that are both fast in learning and computation.
In this work, we model personalized recommendation of news articles as a
contextual bandit problem, a principled approach in which a learning algorithm
sequentially selects articles to serve users based on contextual information
about the users and articles, while simultaneously adapting its
article-selection strategy based on user-click feedback to maximize total user
clicks.
The contributions of this work are three-fold. First, we propose a new,
general contextual bandit algorithm that is computationally efficient and well
motivated from learning theory. Second, we argue that any bandit algorithm can
be reliably evaluated offline using previously recorded random traffic.
Finally, using this offline evaluation method, we successfully applied our new
algorithm to a Yahoo! Front Page Today Module dataset containing over 33
million events. Results showed a 12.5% click lift compared to a standard
context-free bandit algorithm, and the advantage becomes even greater when data
gets more scarce.Comment: 10 pages, 5 figure
MACHINE LEARNING AND CAUSALITY FOR INTERPRETABLE AND AUTOMATED DECISION MAKING
This abstract explores two key areas in decision science: automated and interpretable decision making. In the first part, we address challenges related to sparse user interaction data and high item turnover rates in recommender systems. We introduce a novel algorithm called Multi-View Interactive Collaborative Filtering (MV-ICTR) that integrates user-item ratings and contextual information, improving performance, particularly for cold-start scenarios. In the second part, we focus on Student Prescription Trees (SPTs), which are interpretable decision trees. These trees use a black box teacher model to predict counterfactuals based on observed covariates. We experiment with a Bayesian hierarchical binomial regression model as the teacher and employ statistical significance testing to control tree growth, ensuring interpretable decision trees. Overall, our research advances the field of decision science by addressing challenges in automated and interpretable decision making, offering solutions for improved performance and interpretability
- …