15 research outputs found
Alpenglow: Open source recommender framework with time-Aware learning and evaluation
International audienc
Measuring the Eccentricity of Items
The long-tail phenomenon tells us that there are many items in the tail.
However, not all tail items are the same. Each item acquires different kinds of
users. Some items are loved by the general public, while some items are
consumed by eccentric fans. In this paper, we propose a novel metric, item
eccentricity, to incorporate this difference between consumers of the items.
Eccentric items are defined as items that are consumed by eccentric users. We
used this metric to analyze two real-world datasets of music and movies and
observed the characteristics of items in terms of eccentricity. The results
showed that our defined eccentricity of an item does not change much over time,
and classified eccentric and noneccentric items present significantly distinct
characteristics. The proposed metric effectively separates the eccentric and
noneccentric items mixed in the tail, which could not be done with the previous
measures, which only consider the popularity of items.Comment: Accepted at IEEE International Conference on Systems, Man, and
Cybernetics (SMC) 201
Tuning Word2vec for Large Scale Recommendation Systems
Word2vec is a powerful machine learning tool that emerged from Natural
Lan-guage Processing (NLP) and is now applied in multiple domains, including
recom-mender systems, forecasting, and network analysis. As Word2vec is often
used offthe shelf, we address the question of whether the default
hyperparameters are suit-able for recommender systems. The answer is
emphatically no. In this paper, wefirst elucidate the importance of
hyperparameter optimization and show that un-constrained optimization yields an
average 221% improvement in hit rate over thedefault parameters. However,
unconstrained optimization leads to hyperparametersettings that are very
expensive and not feasible for large scale recommendationtasks. To this end, we
demonstrate 138% average improvement in hit rate with aruntime
budget-constrained hyperparameter optimization. Furthermore, to
makehyperparameter optimization applicable for large scale recommendation
problemswhere the target dataset is too large to search over, we investigate
generalizinghyperparameters settings from samples. We show that applying
constrained hy-perparameter optimization using only a 10% sample of the data
still yields a 91%average improvement in hit rate over the default parameters
when applied to thefull datasets. Finally, we apply hyperparameters learned
using our method of con-strained optimization on a sample to the Who To Follow
recommendation serviceat Twitter and are able to increase follow rates by 15%.Comment: 11 pages, 4 figures, Fourteenth ACM Conference on Recommender System
EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding
Embedding learning transforms discrete data entities into continuous
numerical representations, encoding features/properties of the entities.
Despite the outstanding performance reported from different embedding learning
algorithms, few efforts were devoted to structurally interpreting how features
are encoded in the learned embedding space. This work proposes EmbeddingTree, a
hierarchical embedding exploration algorithm that relates the semantics of
entity features with the less-interpretable embedding vectors. An interactive
visualization tool is also developed based on EmbeddingTree to explore
high-dimensional embeddings. The tool helps users discover nuance features of
data entities, perform feature denoising/injecting in embedding training, and
generate embeddings for unseen entities. We demonstrate the efficacy of
EmbeddingTree and our visualization tool through embeddings generated for
industry-scale merchant data and the public 30Music listening/playlists
dataset.Comment: 5 pages, 3 figures, accepted by PacificVis 202
A Scalable Framework for Automatic Playlist Continuation on Music Streaming Services
Music streaming services often aim to recommend songs for users to extend the
playlists they have created on these services. However, extending playlists
while preserving their musical characteristics and matching user preferences
remains a challenging task, commonly referred to as Automatic Playlist
Continuation (APC). Besides, while these services often need to select the best
songs to recommend in real-time and among large catalogs with millions of
candidates, recent research on APC mainly focused on models with few
scalability guarantees and evaluated on relatively small datasets. In this
paper, we introduce a general framework to build scalable yet effective APC
models for large-scale applications. Based on a represent-then-aggregate
strategy, it ensures scalability by design while remaining flexible enough to
incorporate a wide range of representation learning and sequence modeling
techniques, e.g., based on Transformers. We demonstrate the relevance of this
framework through in-depth experimental validation on Spotify's Million
Playlist Dataset (MPD), the largest public dataset for APC. We also describe
how, in 2022, we successfully leveraged this framework to improve APC in
production on Deezer. We report results from a large-scale online A/B test on
this service, emphasizing the practical impact of our approach in such a
real-world application.Comment: Accepted as a Full Paper at the SIGIR 2023 conferenc