2,084 research outputs found
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization
Recommender systems leverage user demographic information, such as age,
gender, etc., to personalize recommendations and better place their targeted
ads. Oftentimes, users do not volunteer this information due to privacy
concerns, or due to a lack of initiative in filling out their online profiles.
We illustrate a new threat in which a recommender learns private attributes of
users who do not voluntarily disclose them. We design both passive and active
attacks that solicit ratings for strategically selected items, and could thus
be used by a recommender system to pursue this hidden agenda. Our methods are
based on a novel usage of Bayesian matrix factorization in an active learning
setting. Evaluations on multiple datasets illustrate that such attacks are
indeed feasible and use significantly fewer rated items than static inference
methods. Importantly, they succeed without sacrificing the quality of
recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys
201
Choosing Attribute Weights for Item Dissimilarity using Clikstream Data with an Application to a Product Catalog Map
In content- and knowledge-based recommender systems often a measure of (dis)similarity between items is used. Frequently, this measure is based on the attributes of the items. However, which attributes are important for the users of the system remains an important question to answer. In this paper, we present an approach to determine attribute weights in a dissimilarity measure using clickstream data of an e-commerce website. Counted is how many times products are sold and based on this a Poisson regression model is estimated. Estimates of this model are then used to determine the attribute weights in the dissimilarity measure. We show an application of this approach on a product catalog of MP3 players provided by Compare Group, owner of the Dutch price comparison site http://www.vergelijk.nl, and show how the dissimilarity measure can be used to improve 2D product catalog visualizations.dissimilarity measure;attribute weights;clickstream data;comparison
Adversarial Variational Embedding for Robust Semi-supervised Learning
Semi-supervised learning is sought for leveraging the unlabelled data when
labelled data is difficult or expensive to acquire. Deep generative models
(e.g., Variational Autoencoder (VAE)) and semisupervised Generative Adversarial
Networks (GANs) have recently shown promising performance in semi-supervised
classification for the excellent discriminative representing ability. However,
the latent code learned by the traditional VAE is not exclusive (repeatable)
for a specific input sample, which prevents it from excellent classification
performance. In particular, the learned latent representation depends on a
non-exclusive component which is stochastically sampled from the prior
distribution. Moreover, the semi-supervised GAN models generate data from
pre-defined distribution (e.g., Gaussian noises) which is independent of the
input data distribution and may obstruct the convergence and is difficult to
control the distribution of the generated data. To address the aforementioned
issues, we propose a novel Adversarial Variational Embedding (AVAE) framework
for robust and effective semi-supervised learning to leverage both the
advantage of GAN as a high quality generative model and VAE as a posterior
distribution learner. The proposed approach first produces an exclusive latent
code by the model which we call VAE++, and meanwhile, provides a meaningful
prior distribution for the generator of GAN. The proposed approach is evaluated
over four different real-world applications and we show that our method
outperforms the state-of-the-art models, which confirms that the combination of
VAE++ and GAN can provide significant improvements in semisupervised
classification.Comment: 9 pages, Accepted by Research Track in KDD 201
Preference Networks: Probabilistic Models for Recommendation Systems
Recommender systems are important to help users select relevant and
personalised information over massive amounts of data available. We propose an
unified framework called Preference Network (PN) that jointly models various
types of domain knowledge for the task of recommendation. The PN is a
probabilistic model that systematically combines both content-based filtering
and collaborative filtering into a single conditional Markov random field. Once
estimated, it serves as a probabilistic database that supports various useful
queries such as rating prediction and top- recommendation. To handle the
challenging problem of learning large networks of users and items, we employ a
simple but effective pseudo-likelihood with regularisation. Experiments on the
movie rating data demonstrate the merits of the PN.Comment: In Proc. of 6th Australasian Data Mining Conference (AusDM), Gold
Coast, Australia, pages 195--202, 200
Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline
Recommender systems constitute the core engine of most social network
platforms nowadays, aiming to maximize user satisfaction along with other key
business objectives. Twitter is no exception. Despite the fact that Twitter
data has been extensively used to understand socioeconomic and political
phenomena and user behaviour, the implicit feedback provided by users on Tweets
through their engagements on the Home Timeline has only been explored to a
limited extent. At the same time, there is a lack of large-scale public social
network datasets that would enable the scientific community to both benchmark
and build more powerful and comprehensive models that tailor content to user
interests. By releasing an original dataset of 160 million Tweets along with
engagement information, Twitter aims to address exactly that. During this
release, special attention is drawn on maintaining compliance with existing
privacy laws. Apart from user privacy, this paper touches on the key challenges
faced by researchers and professionals striving to predict user engagements. It
further describes the key aspects of the RecSys 2020 Challenge that was
organized by ACM RecSys in partnership with Twitter using this dataset.Comment: 16 pages, 2 table
- …