1,559 research outputs found
Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising
Real-time bidding (RTB) is an important mechanism in online display
advertising, where a proper bid for each page view plays an essential role for
good marketing results. Budget constrained bidding is a typical scenario in RTB
where the advertisers hope to maximize the total value of the winning
impressions under a pre-set budget constraint. However, the optimal bidding
strategy is hard to be derived due to the complexity and volatility of the
auction environment. To address these challenges, in this paper, we formulate
budget constrained bidding as a Markov Decision Process and propose a
model-free reinforcement learning framework to resolve the optimization
problem. Our analysis shows that the immediate reward from environment is
misleading under a critical resource constraint. Therefore, we innovate a
reward function design methodology for the reinforcement learning problems with
constraints. Based on the new reward design, we employ a deep neural network to
learn the appropriate reward so that the optimal policy can be learned
effectively. Different from the prior model-based work, which suffers from the
scalability problem, our framework is easy to be deployed in large-scale
industrial applications. The experimental evaluations demonstrate the
effectiveness of our framework on large-scale real datasets.Comment: In The 27th ACM International Conference on Information and Knowledge
Management (CIKM 18), October 22-26, 2018, Torino, Italy. ACM, New York, NY,
USA, 9 page
Learning Adaptive Display Exposure for Real-Time Advertising
In E-commerce advertising, where product recommendations and product ads are
presented to users simultaneously, the traditional setting is to display ads at
fixed positions. However, under such a setting, the advertising system loses
the flexibility to control the number and positions of ads, resulting in
sub-optimal platform revenue and user experience. Consequently, major
e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible
ways to display ads. In this paper, we investigate the problem of advertising
with adaptive exposure: can we dynamically determine the number and positions
of ads for each user visit under certain business constraints so that the
platform revenue can be increased? More specifically, we consider two types of
constraints: request-level constraint ensures user experience for each user
visit, and platform-level constraint controls the overall platform monetization
rate. We model this problem as a Constrained Markov Decision Process with
per-state constraint (psCMDP) and propose a constrained two-level reinforcement
learning approach to decompose the original problem into two relatively
independent sub-problems. To accelerate policy learning, we also devise a
constrained hindsight experience replay mechanism. Experimental evaluations on
industry-scale real-world datasets demonstrate the merits of our approach in
both obtaining higher revenue under the constraints and the effectiveness of
the constrained hindsight experience replay mechanism.Comment: accepted by CIKM201
Real-Time Bidding by Reinforcement Learning in Display Advertising
The majority of online display ads are served through real-time bidding (RTB)
--- each ad display impression is auctioned off in real-time when it is just
being generated from a user visit. To place an ad automatically and optimally,
it is critical for advertisers to devise a learning algorithm to cleverly bid
an ad impression in real-time. Most previous works consider the bid decision as
a static optimization problem of either treating the value of each impression
independently or setting a bid price to each segment of ad volume. However, the
bidding for a given ad campaign would repeatedly happen during its life span
before the budget runs out. As such, each bid is strategically correlated by
the constrained budget and the overall effectiveness of the campaign (e.g., the
rewards from generated clicks), which is only observed after the campaign has
completed. Thus, it is of great interest to devise an optimal bidding strategy
sequentially so that the campaign budget can be dynamically allocated across
all the available impressions on the basis of both the immediate and future
rewards. In this paper, we formulate the bid decision process as a
reinforcement learning problem, where the state space is represented by the
auction information and the campaign's real-time parameters, while an action is
the bid price to set. By modeling the state transition via auction competition,
we build a Markov Decision Process framework for learning the optimal bidding
policy to optimize the advertising performance in the dynamic real-time bidding
environment. Furthermore, the scalability problem from the large real-world
auction volume and campaign budget is well handled by state value approximation
using neural networks.Comment: WSDM 201
Bid Optimization by Multivariable Control in Display Advertising
Real-Time Bidding (RTB) is an important paradigm in display advertising,
where advertisers utilize extended information and algorithms served by Demand
Side Platforms (DSPs) to improve advertising performance. A common problem for
DSPs is to help advertisers gain as much value as possible with budget
constraints. However, advertisers would routinely add certain key performance
indicator (KPI) constraints that the advertising campaign must meet due to
practical reasons. In this paper, we study the common case where advertisers
aim to maximize the quantity of conversions, and set cost-per-click (CPC) as a
KPI constraint. We convert such a problem into a linear programming problem and
leverage the primal-dual method to derive the optimal bidding strategy. To
address the applicability issue, we propose a feedback control-based solution
and devise the multivariable control system. The empirical study based on
real-word data from Taobao.com verifies the effectiveness and superiority of
our approach compared with the state of the art in the industry practices
ROI-Constrained Bidding via Curriculum-Guided Bayesian Reinforcement Learning
Real-Time Bidding (RTB) is an important mechanism in modern online
advertising systems. Advertisers employ bidding strategies in RTB to optimize
their advertising effects subject to various financial requirements, especially
the return-on-investment (ROI) constraint. ROIs change non-monotonically during
the sequential bidding process, and often induce a see-saw effect between
constraint satisfaction and objective optimization. While some existing
approaches show promising results in static or mildly changing ad markets, they
fail to generalize to highly dynamic ad markets with ROI constraints, due to
their inability to adaptively balance constraints and objectives amidst
non-stationarity and partial observability. In this work, we specialize in
ROI-Constrained Bidding in non-stationary markets. Based on a Partially
Observable Constrained Markov Decision Process, our method exploits an
indicator-augmented reward function free of extra trade-off parameters and
develops a Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework
to adaptively control the constraint-objective trade-off in non-stationary ad
markets. Extensive experiments on a large-scale industrial dataset with two
problem settings reveal that CBRL generalizes well in both in-distribution and
out-of-distribution data regimes, and enjoys superior learning efficiency and
stability.Comment: Accepted by SIGKDD 202
Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising
Real-time advertising allows advertisers to bid for each impression for a
visiting user. To optimize specific goals such as maximizing revenue and return
on investment (ROI) led by ad placements, advertisers not only need to estimate
the relevance between the ads and user's interests, but most importantly
require a strategic response with respect to other advertisers bidding in the
market. In this paper, we formulate bidding optimization with multi-agent
reinforcement learning. To deal with a large number of advertisers, we propose
a clustering method and assign each cluster with a strategic bidding agent. A
practical Distributed Coordinated Multi-Agent Bidding (DCMAB) has been proposed
and implemented to balance the tradeoff between the competition and cooperation
among advertisers. The empirical study on our industry-scaled real-world data
has demonstrated the effectiveness of our methods. Our results show
cluster-based bidding would largely outperform single-agent and bandit
approaches, and the coordinated bidding achieves better overall objectives than
purely self-interested bidding agents
- …