164 research outputs found

    Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

    Full text link
    Motivated by the observation that overexposure to unwanted marketing activities leads to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the platform due to marketing fatigue. We propose a novel sequential choice model to capture multiple interactions taking place between the platform and its user: Upon receiving a message, a user decides on one of the three actions: accept the message, skip and receive the next message, or abandon the platform. Based on user feedback, the platform dynamically learns users' abandonment distribution and their valuations of messages to determine the length of the sequence and the order of the messages, while maximizing the cumulative payoff over a horizon of length T. We refer to this online learning task as the sequential choice bandit problem. For the offline combinatorial optimization problem, we show that an efficient polynomial-time algorithm exists. For the online problem, we propose an algorithm that balances exploration and exploitation, and characterize its regret bound. Lastly, we demonstrate how to extend the model with user contexts to incorporate personalization

    Revenue Maximization and Learning in Products Ranking

    Full text link
    We consider the revenue maximization problem for an online retailer who plans to display a set of products differing in their prices and qualities and rank them in order. The consumers have random attention spans and view the products sequentially before purchasing a ``satisficing'' product or leaving the platform empty-handed when the attention span gets exhausted. Our framework extends the cascade model in two directions: the consumers have random attention spans instead of fixed ones and the firm maximizes revenues instead of clicking probabilities. We show a nested structure of the optimal product ranking as a function of the attention span when the attention span is fixed and design a 1/e1/e-approximation algorithm accordingly for the random attention spans. When the conditional purchase probabilities are not known and may depend on consumer and product features, we devise an online learning algorithm that achieves O~(T)\tilde{\mathcal{O}}(\sqrt{T}) regret relative to the approximation algorithm, despite of the censoring of information: the attention span of a customer who purchases an item is not observable. Numerical experiments demonstrate the outstanding performance of the approximation and online learning algorithms

    Fatigue-aware Bandits for Dependent Click Models

    Full text link
    As recommender systems send a massive amount of content to keep users engaged, users may experience fatigue which is contributed by 1) an overexposure to irrelevant content, 2) boredom from seeing too many similar recommendations. To address this problem, we consider an online learning setting where a platform learns a policy to recommend content that takes user fatigue into account. We propose an extension of the Dependent Click Model (DCM) to describe users' behavior. We stipulate that for each piece of content, its attractiveness to a user depends on its intrinsic relevance and a discount factor which measures how many similar contents have been shown. Users view the recommended content sequentially and click on the ones that they find attractive. Users may leave the platform at any time, and the probability of exiting is higher when they do not like the content. Based on user's feedback, the platform learns the relevance of the underlying content as well as the discounting effect due to content fatigue. We refer to this learning task as "fatigue-aware DCM Bandit" problem. We consider two learning scenarios depending on whether the discounting effect is known. For each scenario, we propose a learning algorithm which simultaneously explores and exploits, and characterize its regret bound

    Dynamic physical activity recommendation on personalised mobile health information service: A deep reinforcement learning approach

    Full text link
    Mobile health (mHealth) information service makes healthcare management easier for users, who want to increase physical activity and improve health. However, the differences in activity preference among the individual, adherence problems, and uncertainty of future health outcomes may reduce the effect of the mHealth information service. The current health service system usually provides recommendations based on fixed exercise plans that do not satisfy the user specific needs. This paper seeks an efficient way to make physical activity recommendation decisions on physical activity promotion in personalised mHealth information service by establishing data-driven model. In this study, we propose a real-time interaction model to select the optimal exercise plan for the individual considering the time-varying characteristics in maximising the long-term health utility of the user. We construct a framework for mHealth information service system comprising a personalised AI module, which is based on the scientific knowledge about physical activity to evaluate the individual exercise performance, which may increase the awareness of the mHealth artificial intelligence system. The proposed deep reinforcement learning (DRL) methodology combining two classes of approaches to improve the learning capability for the mHealth information service system. A deep learning method is introduced to construct the hybrid neural network combing long-short term memory (LSTM) network and deep neural network (DNN) techniques to infer the individual exercise behavior from the time series data. A reinforcement learning method is applied based on the asynchronous advantage actor-critic algorithm to find the optimal policy through exploration and exploitation
    • …
    corecore