3,286 research outputs found
The Music Streaming Sessions Dataset
At the core of many important machine learning problems faced by online
streaming services is a need to model how users interact with the content.
These problems can often be reduced to a combination of 1) sequentially
recommending items to the user, and 2) exploiting the user's interactions with
the items as feedback for the machine learning model. Unfortunately, there are
no public datasets currently available that enable researchers to explore this
topic. In order to spur that research, we release the Music Streaming Sessions
Dataset (MSSD), which consists of approximately 150 million listening sessions
and associated user actions. Furthermore, we provide audio features and
metadata for the approximately 3.7 million unique tracks referred to in the
logs. This is the largest collection of such track metadata currently available
to the public. This dataset enables research on important problems including
how to model user listening and interaction behaviour in streaming, as well as
Music Information Retrieval (MIR), and session-based sequential
recommendations.Comment: 3 pages, introducing a new large scale datase
Personalized Video Recommendation Using Rich Contents from Videos
Video recommendation has become an essential way of helping people explore
the massive videos and discover the ones that may be of interest to them. In
the existing video recommender systems, the models make the recommendations
based on the user-video interactions and single specific content features. When
the specific content features are unavailable, the performance of the existing
models will seriously deteriorate. Inspired by the fact that rich contents
(e.g., text, audio, motion, and so on) exist in videos, in this paper, we
explore how to use these rich contents to overcome the limitations caused by
the unavailability of the specific ones. Specifically, we propose a novel
general framework that incorporates arbitrary single content feature with
user-video interactions, named as collaborative embedding regression (CER)
model, to make effective video recommendation in both in-matrix and
out-of-matrix scenarios. Our extensive experiments on two real-world
large-scale datasets show that CER beats the existing recommender models with
any single content feature and is more time efficient. In addition, we propose
a priority-based late fusion (PRI) method to gain the benefit brought by the
integrating the multiple content features. The corresponding experiment shows
that PRI brings real performance improvement to the baseline and outperforms
the existing fusion methods
How to Retrain Recommender System? A Sequential Meta-Learning Method
Practical recommender systems need be periodically retrained to refresh the
model with new interaction data. To pursue high model fidelity, it is usually
desirable to retrain the model on both historical and new data, since it can
account for both long-term and short-term user preference. However, a full
model retraining could be very time-consuming and memory-costly, especially
when the scale of historical data is large. In this work, we study the model
retraining mechanism for recommender systems, a topic of high practical values
but has been relatively little explored in the research community.
Our first belief is that retraining the model on historical data is
unnecessary, since the model has been trained on it before. Nevertheless,
normal training on new data only may easily cause overfitting and forgetting
issues, since the new data is of a smaller scale and contains fewer information
on long-term user preference. To address this dilemma, we propose a new
training method, aiming to abandon the historical data during retraining
through learning to transfer the past training experience. Specifically, we
design a neural network-based transfer component, which transforms the old
model to a new model that is tailored for future recommendations. To learn the
transfer component well, we optimize the "future performance" -- i.e., the
recommendation accuracy evaluated in the next time period. Our Sequential
Meta-Learning(SML) method offers a general training paradigm that is applicable
to any differentiable model. We demonstrate SML on matrix factorization and
conduct experiments on two real-world datasets. Empirical results show that SML
not only achieves significant speed-up, but also outperforms the full model
retraining in recommendation accuracy, validating the effectiveness of our
proposals. We release our codes at: https://github.com/zyang1580/SML.Comment: Appear in SIGIR 202
Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks
Session-based recommendations are highly relevant in many modern on-line
services (e.g. e-commerce, video streaming) and recommendation settings.
Recently, Recurrent Neural Networks have been shown to perform very well in
session-based settings. While in many session-based recommendation domains user
identifiers are hard to come by, there are also domains in which user profiles
are readily available. We propose a seamless way to personalize RNN models with
cross-session information transfer and devise a Hierarchical RNN model that
relays end evolves latent hidden states of the RNNs across user sessions.
Results on two industry datasets show large improvements over the session-only
RNNs
Diverse personalized recommendations with uncertainty from implicit preference data with the Bayesian Mallows Model
Clicking data, which exists in abundance and contains objective user
preference information, is widely used to produce personalized recommendations
in web-based applications. Current popular recommendation algorithms, typically
based on matrix factorizations, often have high accuracy and achieve good
clickthrough rates. However, diversity of the recommended items, which can
greatly enhance user experiences, is often overlooked. Moreover, most
algorithms do not produce interpretable uncertainty quantifications of the
recommendations. In this work, we propose the Bayesian Mallows for Clicking
Data (BMCD) method, which augments clicking data into compatible full ranking
vectors by enforcing all the clicked items to be top-ranked. User preferences
are learned using a Mallows ranking model. Bayesian inference leads to
interpretable uncertainties of each individual recommendation, and we also
propose a method to make personalized recommendations based on such
uncertainties. With a simulation study and a real life data example, we
demonstrate that compared to state-of-the-art matrix factorization, BMCD makes
personalized recommendations with similar accuracy, while achieving much higher
level of diversity, and producing interpretable and actionable uncertainty
estimation.Comment: 27 page
- …