Search CORE

164 research outputs found

Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

Author: Cao Junyu
Sun Wei
Publication venue
Publication date: 19/03/2019
Field of study

Motivated by the observation that overexposure to unwanted marketing activities leads to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the platform due to marketing fatigue. We propose a novel sequential choice model to capture multiple interactions taking place between the platform and its user: Upon receiving a message, a user decides on one of the three actions: accept the message, skip and receive the next message, or abandon the platform. Based on user feedback, the platform dynamically learns users' abandonment distribution and their valuations of messages to determine the length of the sequence and the order of the messages, while maximizing the cumulative payoff over a horizon of length T. We refer to this online learning task as the sequential choice bandit problem. For the offline combinatorial optimization problem, we show that an efficient polynomial-time algorithm exists. For the online problem, we propose an algorithm that balances exploration and exploitation, and characterize its regret bound. Lastly, we demonstrate how to extend the model with user contexts to incorporate personalization

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Revenue Maximization and Learning in Products Ranking

Author: Chen Ningyuan
Li Anran
Yang Shuoguang
Publication venue
Publication date: 07/12/2020
Field of study

We consider the revenue maximization problem for an online retailer who plans to display a set of products differing in their prices and qualities and rank them in order. The consumers have random attention spans and view the products sequentially before purchasing a ``satisficing'' product or leaving the platform empty-handed when the attention span gets exhausted. Our framework extends the cascade model in two directions: the consumers have random attention spans instead of fixed ones and the firm maximizes revenues instead of clicking probabilities. We show a nested structure of the optimal product ranking as a function of the attention span when the attention span is fixed and design a

1/e

-approximation algorithm accordingly for the random attention spans. When the conditional purchase probabilities are not known and may depend on consumer and product features, we devise an online learning algorithm that achieves

\tilde{\mathcal{O}}(\sqrt{T})

regret relative to the approximation algorithm, despite of the censoring of information: the attention span of a customer who purchases an item is not observable. Numerical experiments demonstrate the outstanding performance of the approximation and online learning algorithms

arXiv.org e-Print Archive

Fatigue-aware Bandits for Dependent Click Models

Author: Cao Junyu
Ettl Markus
Shen
Sun Wei
Zuo-Jun
Publication venue
Publication date: 03/04/2020
Field of study

As recommender systems send a massive amount of content to keep users engaged, users may experience fatigue which is contributed by 1) an overexposure to irrelevant content, 2) boredom from seeing too many similar recommendations. To address this problem, we consider an online learning setting where a platform learns a policy to recommend content that takes user fatigue into account. We propose an extension of the Dependent Click Model (DCM) to describe users' behavior. We stipulate that for each piece of content, its attractiveness to a user depends on its intrinsic relevance and a discount factor which measures how many similar contents have been shown. Users view the recommended content sequentially and click on the ones that they find attractive. Users may leave the platform at any time, and the probability of exiting is higher when they do not like the content. Based on user's feedback, the platform learns the relevance of the underlying content as well as the discounting effect due to content fatigue. We refer to this learning task as "fatigue-aware DCM Bandit" problem. We consider two learning scenarios depending on whether the discounting effect is known. For each scenario, we propose a learning algorithm which simultaneously explores and exploits, and characterize its regret bound

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A model a day keeps the doctor away:Reinforcement learning for personalized healthcare

Author: el Hassouni Ali
Publication venue
Publication date: 18/01/2022
Field of study

VU Research Portal

Dynamic physical activity recommendation on personalised mobile health information service: A deep reinforcement learning approach

Author: Fang Ji
Lee Vincent CS
Wang Haiyan
Publication venue
Publication date: 02/04/2022
Field of study

Mobile health (mHealth) information service makes healthcare management easier for users, who want to increase physical activity and improve health. However, the differences in activity preference among the individual, adherence problems, and uncertainty of future health outcomes may reduce the effect of the mHealth information service. The current health service system usually provides recommendations based on fixed exercise plans that do not satisfy the user specific needs. This paper seeks an efficient way to make physical activity recommendation decisions on physical activity promotion in personalised mHealth information service by establishing data-driven model. In this study, we propose a real-time interaction model to select the optimal exercise plan for the individual considering the time-varying characteristics in maximising the long-term health utility of the user. We construct a framework for mHealth information service system comprising a personalised AI module, which is based on the scientific knowledge about physical activity to evaluate the individual exercise performance, which may increase the awareness of the mHealth artificial intelligence system. The proposed deep reinforcement learning (DRL) methodology combining two classes of approaches to improve the learning capability for the mHealth information service system. A deep learning method is introduced to construct the hybrid neural network combing long-short term memory (LSTM) network and deep neural network (DNN) techniques to infer the individual exercise behavior from the time series data. A reinforcement learning method is applied based on the asynchronous advantage actor-critic algorithm to find the optimal policy through exploration and exploitation

arXiv.org e-Print Archive