3 research outputs found
ContentWise Impressions: An Industrial Dataset with Impressions Included
In this article, we introduce the ContentWise Impressions dataset, a
collection of implicit interactions and impressions of movies and TV series
from an Over-The-Top media service, which delivers its media contents over the
Internet. The dataset is distinguished from other already available multimedia
recommendation datasets by the availability of impressions, i.e., the
recommendations shown to the user, its size, and by being open-source. We
describe the data collection process, the preprocessing applied, its
characteristics, and statistics when compared to other commonly used datasets.
We also highlight several possible use cases and research questions that can
benefit from the availability of user impressions in an open-source dataset.
Furthermore, we release software tools to load and split the data, as well as
examples of how to use both user interactions and impressions in several common
recommendation algorithms.Comment: 8 pages, 2 figure
An Evaluation Study of Generative Adversarial Networks for Collaborative Filtering
This work explores the reproducibility of CFGAN. CFGAN and its family of
models (TagRec, MTPR, and CRGAN) learn to generate personalized and
fake-but-realistic rankings of preferences for top-N recommendations by using
previous interactions. This work successfully replicates the results published
in the original paper and discusses the impact of certain differences between
the CFGAN framework and the model used in the original evaluation. The absence
of random noise and the use of real user profiles as condition vectors leaves
the generator prone to learn a degenerate solution in which the output vector
is identical to the input vector, therefore, behaving essentially as a simple
autoencoder. The work further expands the experimental analysis comparing CFGAN
against a selection of simple and well-known properly optimized baselines,
observing that CFGAN is not consistently competitive against them despite its
high computational cost. To ensure the reproducibility of these analyses, this
work describes the experimental methodology and publishes all datasets and
source code
Lightweight and Scalable Model for Tweet Engagements Predictions in a Resource-constrained Environment
In this paper we provide an overview of the approach we used as team Trial&Error for the ACM RecSys Challenge 2021. The competition, organized by Twitter, addresses the problem of predicting different categories of user engagements (Like, Reply, Retweet and Retweet with Comment), given a dataset of previous interactions on the Twitter platform. Our proposed method relies on efficiently leveraging the massive amount of data, crafting a wide variety of features and designing a lightweight solution. This results in a significant reduction of computational resources requirements, both during the training and inference phase. The final model, an optimized LightGBM, allowed our team to reach the 4th position in the final leaderboard and to rank 1st among the academic teams