research

A framework for dataset benchmarking and its application to a new movie rating dataset

Abstract

This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Intelligent Systems and Technology, http://dx.doi.org/10.1145/2751565Rating datasets are of paramount importance in recommender systems research. They serve as input for recommendation algorithms, as simulation data, or for evaluation purposes. In the past, public accessible rating datasets were not abundantly available, leaving researchers no choice but to work with old and static datasets like MovieLens and Netflix. More recently, however, emerging trends as social media and smart-phones are found to provide rich data sources which can be turned into valuable research datasets. While dataset availability is growing, a structured way for introducing and comparing new datasets is currently still lacking. In this work, we propose a five-step framework to introduce and benchmark new datasets in the recommender systems domain. We illustrate our framework on a new movie rating dataset-called Movie Tweetings-collected from Twitter. Following our framework, we detail the origin of the dataset, provide basic descriptive statistics, investigate external validity, report the results of a number of reproducible benchmarks, and conclude by discussing some interesting advantages and appropriate research use cases.This work is funded by a PhD grant to Simon Dooms of the Agency for Innovation by Science and Technology (IWT Vlaanderen) and the Spanish Ministry of Science and Innovation (TIN2013-47090-C3-2). Part of this work was carried out during the tenure of an ERCIM "Alain Bensoussan" Fellowship Programme, funded by European Comission FP7 grant agreement no. 246016. The experiments in this work were carried out using the Stevin Supercomputer Infrastructure at Ghent University, funded by Ghent University, the Hercules Foundation, and the Flemish Government - department EWI

    Similar works