Search CORE

627 research outputs found

DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation

Author: Liebman Elad
Saar-Tsechansky Maytal
Stone Peter
Publication venue
Publication date: 25/03/2015
Field of study

In recent years, there has been growing focus on the study of automated recommender systems. Music recommendation systems serve as a prominent domain for such works, both from an academic and a commercial perspective. A fundamental aspect of music perception is that music is experienced in temporal context and in sequence. In this work we present DJ-MC, a novel reinforcement-learning framework for music recommendation that does not recommend songs individually but rather song sequences, or playlists, based on a model of preferences for both songs and song transitions. The model is learned online and is uniquely adapted for each listener. To reduce exploration time, DJ-MC exploits user feedback to initialize a model, which it subsequently updates by reinforcement. We evaluate our framework with human participants using both real song and playlist data. Our results indicate that DJ-MC's ability to recommend sequences of songs provides a significant improvement over more straightforward approaches, which do not take transitions into account.Comment: -Updated to the most recent and completed version (to be presented at AAMAS 2015) -Updated author list. in Autonomous Agents and Multiagent Systems (AAMAS) 2015, Istanbul, Turkey, May 201

arXiv.org e-Print Archive

CiteSeerX

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

Author: Cauteruccio Joseph
Ciosek Kamil
Dai Zhenwen
Kanoria Surya
Rinaldi Matteo
Tomasi Federico
Publication venue
Publication date: 13/10/2023
Field of study

Personalization of playlists is a common feature in music streaming services, but conventional techniques, such as collaborative filtering, rely on explicit assumptions regarding content quality to learn how to make recommendations. Such assumptions often result in misalignment between offline model objectives and online user satisfaction metrics. In this paper, we present a reinforcement learning framework that solves for such limitations by directly optimizing for user satisfaction metrics via the use of a simulated playlist-generation environment. Using this simulator we develop and train a modified Deep Q-Network, the action head DQN (AH-DQN), in a manner that addresses the challenges imposed by the large state and action space of our RL formulation. The resulting policy is capable of making recommendations from large and dynamic sets of candidate items with the expectation of maximizing consumption metrics. We analyze and evaluate agents offline via simulations that use environment models trained on both public and proprietary streaming datasets. We show how these agents lead to better user-satisfaction metrics compared to baseline methods during online A/B tests. Finally, we demonstrate that performance assessments produced from our simulator are strongly correlated with observed online metric results.Comment: 10 pages. KDD 2

arXiv.org e-Print Archive

Sequential decision making in artificial musical intelligence

Author: Liebman Elad
Publication venue
Publication date: 29/04/2018
Field of study

Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science

Association for the Advancement of Artificial Intelligence: AAAI Publications

Texas ScholarWorks

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks

Author: Dereventsov Anton
Starnes Andrew
Webster Clayton G.
Publication venue
Publication date: 21/11/2022
Field of study

This effort is focused on examining the behavior of reinforcement learning systems in personalization environments and detailing the differences in policy entropy associated with the type of learning algorithm utilized. We demonstrate that Policy Optimization agents often possess low-entropy policies during training, which in practice results in agents prioritizing certain actions and avoiding others. Conversely, we also show that Q-Learning agents are far less susceptible to such behavior and generally maintain high-entropy policies throughout training, which is often preferable in real-world applications. We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed

arXiv.org e-Print Archive

Carousel Personalization in Music Streaming Apps with Contextual Bandits

Author: Agarwal Alekh
Chu Wei
Garivier Aurélien
Jiang Ray
Katariya Sumeet
Komiyama Junpei
Kveton Branislav
Wang Zhiyang
Zhang Shuai
Zhou Li
Zoghi Masrour
Zong Shi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/09/2020
Field of study

Media services providers, such as music streaming platforms, frequently leverage swipeable carousels to recommend personalized content to their users. However, selecting the most relevant items (albums, artists, playlists...) to display in these carousels is a challenging task, as items are numerous and as users have different preferences. In this paper, we model carousel personalization as a contextual multi-armed bandit problem with multiple plays, cascade-based updates and delayed batch feedback. We empirically show the effectiveness of our framework at capturing characteristics of real-world carousels by addressing a large-scale playlist recommendation task on a global music streaming mobile app. Along with this paper, we publicly release industrial data from our experiments, as well as an open-source environment to simulate comparable carousel personalization learning problems.Comment: 14th ACM Conference on Recommender Systems (RecSys 2020, Best Short Paper Candidate

arXiv.org e-Print Archive

Crossref

Improving Advertisement Delivery in Video Streaming

Author: Witkowski Marshall
Witkowski Mary
Publication venue: Technical Disclosure Commons
Publication date: 06/12/2017
Field of study

Generally, the present disclosure is directed to improving advertisement delivery based on the content of a video. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict a non-intrusive location for an advertisement based on the content of a video

Technical Disclosure Common