164 research outputs found
Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue
Motivated by the observation that overexposure to unwanted marketing
activities leads to customer dissatisfaction, we consider a setting where a
platform offers a sequence of messages to its users and is penalized when users
abandon the platform due to marketing fatigue. We propose a novel sequential
choice model to capture multiple interactions taking place between the platform
and its user: Upon receiving a message, a user decides on one of the three
actions: accept the message, skip and receive the next message, or abandon the
platform. Based on user feedback, the platform dynamically learns users'
abandonment distribution and their valuations of messages to determine the
length of the sequence and the order of the messages, while maximizing the
cumulative payoff over a horizon of length T. We refer to this online learning
task as the sequential choice bandit problem. For the offline combinatorial
optimization problem, we show that an efficient polynomial-time algorithm
exists. For the online problem, we propose an algorithm that balances
exploration and exploitation, and characterize its regret bound. Lastly, we
demonstrate how to extend the model with user contexts to incorporate
personalization
Revenue Maximization and Learning in Products Ranking
We consider the revenue maximization problem for an online retailer who plans
to display a set of products differing in their prices and qualities and rank
them in order. The consumers have random attention spans and view the products
sequentially before purchasing a ``satisficing'' product or leaving the
platform empty-handed when the attention span gets exhausted. Our framework
extends the cascade model in two directions: the consumers have random
attention spans instead of fixed ones and the firm maximizes revenues instead
of clicking probabilities. We show a nested structure of the optimal product
ranking as a function of the attention span when the attention span is fixed
and design a -approximation algorithm accordingly for the random attention
spans. When the conditional purchase probabilities are not known and may depend
on consumer and product features, we devise an online learning algorithm that
achieves regret relative to the approximation
algorithm, despite of the censoring of information: the attention span of a
customer who purchases an item is not observable. Numerical experiments
demonstrate the outstanding performance of the approximation and online
learning algorithms
Fatigue-aware Bandits for Dependent Click Models
As recommender systems send a massive amount of content to keep users
engaged, users may experience fatigue which is contributed by 1) an
overexposure to irrelevant content, 2) boredom from seeing too many similar
recommendations. To address this problem, we consider an online learning
setting where a platform learns a policy to recommend content that takes user
fatigue into account. We propose an extension of the Dependent Click Model
(DCM) to describe users' behavior. We stipulate that for each piece of content,
its attractiveness to a user depends on its intrinsic relevance and a discount
factor which measures how many similar contents have been shown. Users view the
recommended content sequentially and click on the ones that they find
attractive. Users may leave the platform at any time, and the probability of
exiting is higher when they do not like the content. Based on user's feedback,
the platform learns the relevance of the underlying content as well as the
discounting effect due to content fatigue. We refer to this learning task as
"fatigue-aware DCM Bandit" problem. We consider two learning scenarios
depending on whether the discounting effect is known. For each scenario, we
propose a learning algorithm which simultaneously explores and exploits, and
characterize its regret bound
Dynamic physical activity recommendation on personalised mobile health information service: A deep reinforcement learning approach
Mobile health (mHealth) information service makes healthcare management
easier for users, who want to increase physical activity and improve health.
However, the differences in activity preference among the individual, adherence
problems, and uncertainty of future health outcomes may reduce the effect of
the mHealth information service. The current health service system usually
provides recommendations based on fixed exercise plans that do not satisfy the
user specific needs. This paper seeks an efficient way to make physical
activity recommendation decisions on physical activity promotion in
personalised mHealth information service by establishing data-driven model. In
this study, we propose a real-time interaction model to select the optimal
exercise plan for the individual considering the time-varying characteristics
in maximising the long-term health utility of the user. We construct a
framework for mHealth information service system comprising a personalised AI
module, which is based on the scientific knowledge about physical activity to
evaluate the individual exercise performance, which may increase the awareness
of the mHealth artificial intelligence system. The proposed deep reinforcement
learning (DRL) methodology combining two classes of approaches to improve the
learning capability for the mHealth information service system. A deep learning
method is introduced to construct the hybrid neural network combing long-short
term memory (LSTM) network and deep neural network (DNN) techniques to infer
the individual exercise behavior from the time series data. A reinforcement
learning method is applied based on the asynchronous advantage actor-critic
algorithm to find the optimal policy through exploration and exploitation
- …