33,850 research outputs found
Reinforcement Mechanism Design for E-Commerce
We study the problem of allocating impressions to sellers in e-commerce
websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue
generated by the platform. We employ a general framework of reinforcement
mechanism design, which uses deep reinforcement learning to design efficient
algorithms, taking the strategic behaviour of the sellers into account.
Specifically, we model the impression allocation problem as a Markov decision
process, where the states encode the history of impressions, prices,
transactions and generated revenue and the actions are the possible impression
allocations in each round. To tackle the problem of continuity and
high-dimensionality of states and actions, we adopt the ideas of the DDPG
algorithm to design an actor-critic policy gradient algorithm which takes
advantage of the problem domain in order to achieve convergence and stability.
We evaluate our proposed algorithm, coined IA(GRU), by comparing it against
DDPG, as well as several natural heuristics, under different rationality models
for the sellers - we assume that sellers follow well-known no-regret type
strategies which may vary in their degree of sophistication. We find that
IA(GRU) outperforms all algorithms in terms of the total revenue
Learning Adaptive Display Exposure for Real-Time Advertising
In E-commerce advertising, where product recommendations and product ads are
presented to users simultaneously, the traditional setting is to display ads at
fixed positions. However, under such a setting, the advertising system loses
the flexibility to control the number and positions of ads, resulting in
sub-optimal platform revenue and user experience. Consequently, major
e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible
ways to display ads. In this paper, we investigate the problem of advertising
with adaptive exposure: can we dynamically determine the number and positions
of ads for each user visit under certain business constraints so that the
platform revenue can be increased? More specifically, we consider two types of
constraints: request-level constraint ensures user experience for each user
visit, and platform-level constraint controls the overall platform monetization
rate. We model this problem as a Constrained Markov Decision Process with
per-state constraint (psCMDP) and propose a constrained two-level reinforcement
learning approach to decompose the original problem into two relatively
independent sub-problems. To accelerate policy learning, we also devise a
constrained hindsight experience replay mechanism. Experimental evaluations on
industry-scale real-world datasets demonstrate the merits of our approach in
both obtaining higher revenue under the constraints and the effectiveness of
the constrained hindsight experience replay mechanism.Comment: accepted by CIKM201
Whole-Chain Recommendations
With the recent prevalence of Reinforcement Learning (RL), there have been
tremendous interests in developing RL-based recommender systems. In practical
recommendation sessions, users will sequentially access multiple scenarios,
such as the entrance pages and the item detail pages, and each scenario has its
specific characteristics. However, the majority of existing RL-based
recommender systems focus on optimizing one strategy for all scenarios or
separately optimizing each strategy, which could lead to sub-optimal overall
performance. In this paper, we study the recommendation problem with multiple
(consecutive) scenarios, i.e., whole-chain recommendations. We propose a
multi-agent RL-based approach (DeepChain), which can capture the sequential
correlation among different scenarios and jointly optimize multiple
recommendation strategies. To be specific, all recommender agents (RAs) share
the same memory of users' historical behaviors, and they work collaboratively
to maximize the overall reward of a session. Note that optimizing multiple
recommendation strategies jointly faces two challenges in the existing
model-free RL model - (i) it requires huge amounts of user behavior data, and
(ii) the distribution of reward (users' feedback) are extremely unbalanced. In
this paper, we introduce model-based RL techniques to reduce the training data
requirement and execute more accurate strategy updates. The experimental
results based on a real e-commerce platform demonstrate the effectiveness of
the proposed framework.Comment: 29th ACM International Conference on Information and Knowledge
Managemen
Q-Strategy: A Bidding Strategy for Market-Based Allocation of Grid Services
The application of autonomous agents by the provisioning and usage of computational services is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic service provisioning and usage of Grid services, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems.
The contributions of the paper are threefold. First, we present a bidding agent framework for implementing artificial bidding agents, supporting consumers and providers in technical and economic preference elicitation as well as automated bid generation by the requesting and provisioning of Grid services. Secondly, we introduce a novel consumer-side bidding strategy, which enables a goal-oriented and strategic behavior by the generation and submission of consumer service requests and selection of provider offers. Thirdly, we evaluate and compare the Q-strategy, implemented within the presented framework, against the Truth-Telling bidding strategy in three mechanisms – a centralized CDA, a decentralized on-line machine scheduling and a FIFO-scheduling mechanisms
Rational bidding using reinforcement learning: an application in automated resource allocation
The application of autonomous agents by the provisioning and usage of computational resources is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic resource provisioning and usage of computational resources, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems.
The contributions of the paper are threefold. First, we present a framework for supporting consumers and providers in technical and economic preference elicitation and the generation of bids. Secondly, we introduce a consumer-side reinforcement learning bidding strategy which enables rational behavior by the generation and selection of bids. Thirdly, we evaluate and compare this bidding strategy against a truth-telling bidding strategy for two kinds of market mechanisms – one centralized and one decentralized
- …