311 research outputs found
Low-Rank Factorization of Determinantal Point Processes for Recommendation
Determinantal point processes (DPPs) have garnered attention as an elegant
probabilistic model of set diversity. They are useful for a number of subset
selection tasks, including product recommendation. DPPs are parametrized by a
positive semi-definite kernel matrix. In this work we present a new method for
learning the DPP kernel from observed data using a low-rank factorization of
this kernel. We show that this low-rank factorization enables a learning
algorithm that is nearly an order of magnitude faster than previous approaches,
while also providing for a method for computing product recommendation
predictions that is far faster (up to 20x faster or more for large item
catalogs) than previous techniques that involve a full-rank DPP kernel.
Furthermore, we show that our method provides equivalent or sometimes better
predictive performance than prior full-rank DPP approaches, and better
performance than several other competing recommendation methods in many cases.
We conduct an extensive experimental evaluation using several real-world
datasets in the domain of product recommendation to demonstrate the utility of
our method, along with its limitations.Comment: 10 pages, 4 figures. Submitted to KDD 201
High-performance sampling of generic Determinantal Point Processes
Determinantal Point Processes (DPPs) were introduced by Macchi as a model for
repulsive (fermionic) particle distributions. But their recent popularization
is largely due to their usefulness for encouraging diversity in the final stage
of a recommender system.
The standard sampling scheme for finite DPPs is a spectral decomposition
followed by an equivalent of a randomly diagonally-pivoted Cholesky
factorization of an orthogonal projection, which is only applicable to
Hermitian kernels and has an expensive setup cost. Researchers have begun to
connect DPP sampling to factorizations as a means of avoiding the
initial spectral decomposition, but existing approaches have only outperformed
the spectral decomposition approach in special circumstances, where the number
of kept modes is a small percentage of the ground set size.
This article proves that trivial modifications of and
factorizations yield efficient direct sampling schemes for non-Hermitian and
Hermitian DPP kernels, respectively. Further, it is experimentally shown that
even dynamically-scheduled, shared-memory parallelizations of high-performance
dense and sparse-direct factorizations can be trivially modified to yield DPP
sampling schemes with essentially identical performance.
The software developed as part of this research, Catamari,
https://hodgestar.com/catamari, is released under the Mozilla Public License
v2.0. It contains header-only, C++14 plus OpenMP 4.0 implementations of dense
and sparse-direct, Hermitian and non-Hermitian DPP samplers.Comment: 25 pages, 11 figures. Submitted to the Royal Society's Philosophical
Transactions
Learning Determinantal Point Processes by Corrective Negative Sampling
Determinantal Point Processes (DPPs) have attracted significant interest from
the machine-learning community due to their ability to elegantly and tractably
model the delicate balance between quality and diversity of sets. DPPs are
commonly learned from data using maximum likelihood estimation (MLE). While
fitting observed sets well, MLE for DPPs may also assign high likelihoods to
unobserved sets that are far from the true generative distribution of the data.
To address this issue, which reduces the quality of the learned model, we
introduce a novel optimization problem, Contrastive Estimation (CE), which
encodes information about "negative" samples into the basic learning model. CE
is grounded in the successful use of negative information in machine-vision and
language modeling. Depending on the chosen negative distribution (which may be
static or evolve during optimization), CE assumes two different forms, which we
analyze theoretically and experimentally. We evaluate our new model on
real-world datasets; on a challenging dataset, CE learning delivers a
considerable improvement in predictive performance over a DPP learned without
using contrastive information.Comment: Will appear in AISTATS 201
Learning Nonsymmetric Determinantal Point Processes
Determinantal point processes (DPPs) have attracted substantial attention as
an elegant probabilistic model that captures the balance between quality and
diversity within sets. DPPs are conventionally parameterized by a positive
semi-definite kernel matrix, and this symmetric kernel encodes only repulsive
interactions between items. These so-called symmetric DPPs have significant
expressive power, and have been successfully applied to a variety of machine
learning tasks, including recommendation systems, information retrieval, and
automatic summarization, among many others. Efficient algorithms for learning
symmetric DPPs and sampling from these models have been reasonably well
studied. However, relatively little attention has been given to nonsymmetric
DPPs, which relax the symmetric constraint on the kernel. Nonsymmetric DPPs
allow for both repulsive and attractive item interactions, which can
significantly improve modeling power, resulting in a model that may better fit
for some applications. We present a method that enables a tractable algorithm,
based on maximum likelihood estimation, for learning nonsymmetric DPPs from
data composed of observed subsets. Our method imposes a particular
decomposition of the nonsymmetric kernel that enables such tractable learning
algorithms, which we analyze both theoretically and experimentally. We evaluate
our model on synthetic and real-world datasets, demonstrating improved
predictive performance compared to symmetric DPPs, which have previously shown
strong performance on modeling tasks associated with these datasets.Comment: NeurIPS 201
Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity
The determinantal point process (DPP) is an elegant probabilistic model of
repulsion with applications in various machine learning tasks including
summarization and search. However, the maximum a posteriori (MAP) inference for
DPP which plays an important role in many applications is NP-hard, and even the
popular greedy algorithm can still be too computationally expensive to be used
in large-scale real-time scenarios. To overcome the computational challenge, in
this paper, we propose a novel algorithm to greatly accelerate the greedy MAP
inference for DPP. In addition, our algorithm also adapts to scenarios where
the repulsion is only required among nearby few items in the result sequence.
We apply the proposed algorithm to generate relevant and diverse
recommendations. Experimental results show that our proposed algorithm is
significantly faster than state-of-the-art competitors, and provides a better
relevance-diversity trade-off on several public datasets, which is also
confirmed in an online A/B test
Towards Bursting Filter Bubble via Contextual Risks and Uncertainties
A rising topic in computational journalism is how to enhance the diversity in
news served to subscribers to foster exploration behavior in news reading.
Despite the success of preference learning in personalized news recommendation,
their over-exploitation causes filter bubble that isolates readers from
opposing viewpoints and hurts long-term user experiences with lack of
serendipity. Since news providers can recommend neither opposite nor
diversified opinions if unpopularity of these articles is surely predicted,
they can only bet on the articles whose forecasts of click-through rate involve
high variability (risks) or high estimation errors (uncertainties). We propose
a novel Bayesian model of uncertainty-aware scoring and ranking for news
articles. The Bayesian binary classifier models probability of success (defined
as a news click) as a Beta-distributed random variable conditional on a vector
of the context (user features, article features, and other contextual
features). The posterior of the contextual coefficients can be computed
efficiently using a low-rank version of Laplace's method via thin Singular
Value Decomposition. Efficiencies in personalized targeting of exceptional
articles, which are chosen by each subscriber in test period, are evaluated on
real-world news datasets. The proposed estimator slightly outperformed existing
training and scoring algorithms, in terms of efficiency in identifying
successful outliers.Comment: The filter bubble problem; Uncertainty-aware scoring; Empirical-Bayes
method; Low-rank Laplace's metho
Recent Advances in Diversified Recommendation
With the rapid development of recommender systems, accuracy is no longer the
only golden criterion for evaluating whether the recommendation results are
satisfying or not. In recent years, diversity has gained tremendous attention
in recommender systems research, which has been recognized to be an important
factor for improving user satisfaction. On the one hand, diversified
recommendation helps increase the chance of answering ephemeral user needs. On
the other hand, diversifying recommendation results can help the business
improve product visibility and explore potential user interests. In this paper,
we are going to review the recent advances in diversified recommendation.
Specifically, we first review the various definitions of diversity and generate
a taxonomy to shed light on how diversity have been modeled or measured in
recommender systems. After that, we summarize the major optimization approaches
to diversified recommendation from a taxonomic view. Last but not the least, we
project into the future and point out trending research directions on this
topic
Kronecker Determinantal Point Processes
Determinantal Point Processes (DPPs) are probabilistic models over all
subsets a ground set of items. They have recently gained prominence in
several applications that rely on "diverse" subsets. However, their
applicability to large problems is still limited due to the
complexity of core tasks such as sampling and learning. We enable efficient
sampling and learning for DPPs by introducing KronDPP, a DPP model whose kernel
matrix decomposes as a tensor product of multiple smaller kernel matrices. This
decomposition immediately enables fast exact sampling. But contrary to what one
may expect, leveraging the Kronecker product structure for speeding up DPP
learning turns out to be more difficult. We overcome this challenge, and derive
batch and stochastic optimization algorithms for efficiently learning the
parameters of a KronDPP
Personalized Bundle List Recommendation
Product bundling, offering a combination of items to customers, is one of the
marketing strategies commonly used in online e-commerce and offline retailers.
A high-quality bundle generalizes frequent items of interest, and diversity
across bundles boosts the user-experience and eventually increases transaction
volume. In this paper, we formalize the personalized bundle list recommendation
as a structured prediction problem and propose a bundle generation network
(BGN), which decomposes the problem into quality/diversity parts by the
determinantal point processes (DPPs). BGN uses a typical encoder-decoder
framework with a proposed feature-aware softmax to alleviate the inadequate
representation of traditional softmax, and integrates the masked beam search
and DPP selection to produce high-quality and diversified bundle list with an
appropriate bundle size. We conduct extensive experiments on three public
datasets and one industrial dataset, including two generated from co-purchase
records and the other two extracted from real-world online bundle services. BGN
significantly outperforms the state-of-the-art methods in terms of quality,
diversity and response time over all datasets. In particular, BGN improves the
precision of the best competitors by 16\% on average while maintaining the
highest diversity on four datasets, and yields a 3.85x improvement of response
time over the best competitors in the bundle list recommendation problem.Comment: WWW2019, 11 page
Diverse Landmark Sampling from Determinantal Point Processes for Scalable Manifold Learning
High computational costs of manifold learning prohibit its application for
large point sets. A common strategy to overcome this problem is to perform
dimensionality reduction on selected landmarks and to successively embed the
entire dataset with the Nystr\"om method. The two main challenges that arise
are: (i) the landmarks selected in non-Euclidean geometries must result in a
low reconstruction error, (ii) the graph constructed from sparsely sampled
landmarks must approximate the manifold well. We propose the sampling of
landmarks from determinantal distributions on non-Euclidean spaces. Since
current determinantal sampling algorithms have the same complexity as those for
manifold learning, we present an efficient approximation running in linear
time. Further, we recover the local geometry after the sparsification by
assigning each landmark a local covariance matrix, estimated from the original
point set. The resulting neighborhood selection based on the Bhattacharyya
distance improves the embedding of sparsely sampled manifolds. Our experiments
show a significant performance improvement compared to state-of-the-art
landmark selection techniques
- …