1,409 research outputs found
Fast Differentially Private Matrix Factorization
Differentially private collaborative filtering is a challenging task, both in
terms of accuracy and speed. We present a simple algorithm that is provably
differentially private, while offering good performance, using a novel
connection of differential privacy to Bayesian posterior sampling via
Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm
lends itself to efficient implementation. By careful systems design and by
exploiting the power law behavior of the data to maximize CPU cache bandwidth
we are able to generate 1024 dimensional models at a rate of 8.5 million
recommendations per second on a single PC
Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
Despite having various attractive qualities such as high prediction accuracy
and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix
Factorization has not been widely adopted because of the prohibitive cost of
inference. In this paper, we propose a scalable distributed Bayesian matrix
factorization algorithm using stochastic gradient MCMC. Our algorithm, based on
Distributed Stochastic Gradient Langevin Dynamics, can not only match the
prediction accuracy of standard MCMC methods like Gibbs sampling, but at the
same time is as fast and simple as stochastic gradient descent. In our
experiments, we show that our algorithm can achieve the same level of
prediction accuracy as Gibbs sampling an order of magnitude faster. We also
show that our method reduces the prediction error as fast as distributed
stochastic gradient descent, achieving a 4.1% improvement in RMSE for the
Netflix dataset and an 1.8% for the Yahoo music dataset
Scalable and interpretable product recommendations via overlapping co-clustering
We consider the problem of generating interpretable recommendations by
identifying overlapping co-clusters of clients and products, based only on
positive or implicit feedback. Our approach is applicable on very large
datasets because it exhibits almost linear complexity in the input examples and
the number of co-clusters. We show, both on real industrial data and on
publicly available datasets, that the recommendation accuracy of our algorithm
is competitive to that of state-of-art matrix factorization techniques. In
addition, our technique has the advantage of offering recommendations that are
textually and visually interpretable. Finally, we examine how to implement our
technique efficiently on Graphical Processing Units (GPUs).Comment: In IEEE International Conference on Data Engineering (ICDE) 201
- …