Search CORE

4,196 research outputs found

Mining cross-domain rating datasets from structured data on Twitter

Author: De Pessemier Toon
Dooms Simon
Martens Luc
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2014
Field of study

Crossref

Ghent University Academic Bibliography

A large multilingual and multi-domain dataset for recommender systems

Author: DI TOMMASO Giorgia
Faralli Stefano
Velardi Paola
Publication venue
Publication date: 01/01/2018
Field of study

This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others

Archivio della ricerca- Università di Roma La Sapienza

A framework for dataset benchmarking and its application to a new movie rating dataset

Author: Bellogín Alejandro
De Pessemier Toon
Dooms Simon
Martens Luc C M
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Intelligent Systems and Technology, http://dx.doi.org/10.1145/2751565Rating datasets are of paramount importance in recommender systems research. They serve as input for recommendation algorithms, as simulation data, or for evaluation purposes. In the past, public accessible rating datasets were not abundantly available, leaving researchers no choice but to work with old and static datasets like MovieLens and Netflix. More recently, however, emerging trends as social media and smart-phones are found to provide rich data sources which can be turned into valuable research datasets. While dataset availability is growing, a structured way for introducing and comparing new datasets is currently still lacking. In this work, we propose a five-step framework to introduce and benchmark new datasets in the recommender systems domain. We illustrate our framework on a new movie rating dataset-called Movie Tweetings-collected from Twitter. Following our framework, we detail the origin of the dataset, provide basic descriptive statistics, investigate external validity, report the results of a number of reproducible benchmarks, and conclude by discussing some interesting advantages and appropriate research use cases.This work is funded by a PhD grant to Simon Dooms of the Agency for Innovation by Science and Technology (IWT Vlaanderen) and the Spanish Ministry of Science and Innovation (TIN2013-47090-C3-2). Part of this work was carried out during the tenure of an ERCIM "Alain Bensoussan" Fellowship Programme, funded by European Comission FP7 grant agreement no. 246016. The experiments in this work were carried out using the Stevin Supercomputer Infrastructure at Ghent University, funded by Ghent University, the Hercules Foundation, and the Flemish Government - department EWI

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ghent University Academic Bibliography

Biblos-e Archivo

BCS SGAI SMA 2013: the BCS SGAI workshop on social media analysis

Author
Publication venue: M. Jeusfeld
Publication date: 01/01/2013
Field of study

Portsmouth University Research Portal (Pure)

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

Author: Gupta Prakhar
Jaggi Martin
Pagliardini Matteo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 21/06/2017
Field of study

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.Comment: NAACL 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref