11,843 research outputs found
Stochastic Bandit Models for Delayed Conversions
Online advertising and product recommendation are important domains of
applications for multi-armed bandit methods. In these fields, the reward that
is immediately available is most often only a proxy for the actual outcome of
interest, which we refer to as a conversion. For instance, in web advertising,
clicks can be observed within a few seconds after an ad display but the
corresponding sale --if any-- will take hours, if not days to happen. This
paper proposes and investigates a new stochas-tic multi-armed bandit model in
the framework proposed by Chapelle (2014) --based on empirical studies in the
field of web advertising-- in which each action may trigger a future reward
that will then happen with a stochas-tic delay. We assume that the probability
of conversion associated with each action is unknown while the distribution of
the conversion delay is known, distinguishing between the (idealized) case
where the conversion events may be observed whatever their delay and the more
realistic setting in which late conversions are censored. We provide
performance lower bounds as well as two simple but efficient algorithms based
on the UCB and KLUCB frameworks. The latter algorithm, which is preferable when
conversion rates are low, is based on a Poissonization argument, of independent
interest in other settings where aggregation of Bernoulli observations with
different success probabilities is required.Comment: Conference on Uncertainty in Artificial Intelligence, Aug 2017,
Sydney, Australi
Cost-sensitive Learning for Utility Optimization in Online Advertising Auctions
One of the most challenging problems in computational advertising is the
prediction of click-through and conversion rates for bidding in online
advertising auctions. An unaddressed problem in previous approaches is the
existence of highly non-uniform misprediction costs. While for model evaluation
these costs have been taken into account through recently proposed
business-aware offline metrics -- such as the Utility metric which measures the
impact on advertiser profit -- this is not the case when training the models
themselves. In this paper, to bridge the gap, we formally analyze the
relationship between optimizing the Utility metric and the log loss, which is
considered as one of the state-of-the-art approaches in conversion modeling.
Our analysis motivates the idea of weighting the log loss with the business
value of the predicted outcome. We present and analyze a new cost weighting
scheme and show that significant gains in offline and online performance can be
achieved.Comment: First version of the paper was presented at NIPS 2015 Workshop on
E-Commerce: https://sites.google.com/site/nips15ecommerce/papers Third
version of the paper will be presented at AdKDD 2017 Workshop:
adkdd17.wixsite.com/adkddtargetad201
Real-Time Bidding by Reinforcement Learning in Display Advertising
The majority of online display ads are served through real-time bidding (RTB)
--- each ad display impression is auctioned off in real-time when it is just
being generated from a user visit. To place an ad automatically and optimally,
it is critical for advertisers to devise a learning algorithm to cleverly bid
an ad impression in real-time. Most previous works consider the bid decision as
a static optimization problem of either treating the value of each impression
independently or setting a bid price to each segment of ad volume. However, the
bidding for a given ad campaign would repeatedly happen during its life span
before the budget runs out. As such, each bid is strategically correlated by
the constrained budget and the overall effectiveness of the campaign (e.g., the
rewards from generated clicks), which is only observed after the campaign has
completed. Thus, it is of great interest to devise an optimal bidding strategy
sequentially so that the campaign budget can be dynamically allocated across
all the available impressions on the basis of both the immediate and future
rewards. In this paper, we formulate the bid decision process as a
reinforcement learning problem, where the state space is represented by the
auction information and the campaign's real-time parameters, while an action is
the bid price to set. By modeling the state transition via auction competition,
we build a Markov Decision Process framework for learning the optimal bidding
policy to optimize the advertising performance in the dynamic real-time bidding
environment. Furthermore, the scalability problem from the large real-world
auction volume and campaign budget is well handled by state value approximation
using neural networks.Comment: WSDM 201
- …