786 research outputs found
Fixed-point algorithms for learning determinantal point processes
Determinantal point processes (DPPs) offer an elegant tool for encoding
probabilities over subsets of a ground set. Discrete DPPs are parametrized by a
positive semidefinite matrix (called the DPP kernel), and estimating this
kernel is key to learning DPPs from observed data. We consider the task of
learning the DPP kernel, and develop for it a surprisingly simple yet effective
new algorithm. Our algorithm offers the following benefits over previous
approaches: (a) it is much simpler; (b) it yields equally good and sometimes
even better local maxima; and (c) it runs an order of magnitude faster on large
problems. We present experimental results on both real and simulated data to
illustrate the numerical performance of our technique.Comment: ICML, 201
Fast determinantal point processes via distortion-free intermediate sampling
Given a fixed matrix , where , we study the
complexity of sampling from a distribution over all subsets of rows where the
probability of a subset is proportional to the squared volume of the
parallelepiped spanned by the rows (a.k.a. a determinantal point process). In
this task, it is important to minimize the preprocessing cost of the procedure
(performed once) as well as the sampling cost (performed repeatedly). To that
end, we propose a new determinantal point process algorithm which has the
following two properties, both of which are novel: (1) a preprocessing step
which runs in time , and (2) a sampling step which runs in
time, independent of the number of rows . We achieve this by introducing a
new regularized determinantal point process (R-DPP), which serves as an
intermediate distribution in the sampling procedure by reducing the number of
rows from to . Crucially, this intermediate distribution
does not distort the probabilities of the target sample. Our key novelty in
defining the R-DPP is the use of a Poisson random variable for controlling the
probabilities of different subset sizes, leading to new determinantal formulas
such as the normalization constant for this distribution. Our algorithm has
applications in many diverse areas where determinantal point processes have
been used, such as machine learning, stochastic optimization, data
summarization and low-rank matrix reconstruction
Efficient Sampling for k-Determinantal Point Processes
Determinantal Point Processes (DPPs) are elegant probabilistic models of
repulsion and diversity over discrete sets of items. But their applicability to
large sets is hindered by expensive cubic-complexity matrix operations for
basic tasks such as sampling. In light of this, we propose a new method for
approximate sampling from discrete -DPPs. Our method takes advantage of the
diversity property of subsets sampled from a DPP, and proceeds in two stages:
first it constructs coresets for the ground set of items; thereafter, it
efficiently samples subsets based on the constructed coresets. As opposed to
previous approaches, our algorithm aims to minimize the total variation
distance to the original distribution. Experiments on both synthetic and real
datasets indicate that our sampling algorithm works efficiently on large data
sets, and yields more accurate samples than previous approaches
Kronecker Determinantal Point Processes
Determinantal Point Processes (DPPs) are probabilistic models over all
subsets a ground set of items. They have recently gained prominence in
several applications that rely on "diverse" subsets. However, their
applicability to large problems is still limited due to the
complexity of core tasks such as sampling and learning. We enable efficient
sampling and learning for DPPs by introducing KronDPP, a DPP model whose kernel
matrix decomposes as a tensor product of multiple smaller kernel matrices. This
decomposition immediately enables fast exact sampling. But contrary to what one
may expect, leveraging the Kronecker product structure for speeding up DPP
learning turns out to be more difficult. We overcome this challenge, and derive
batch and stochastic optimization algorithms for efficiently learning the
parameters of a KronDPP
Learning Determinantal Point Processes by Corrective Negative Sampling
Determinantal Point Processes (DPPs) have attracted significant interest from
the machine-learning community due to their ability to elegantly and tractably
model the delicate balance between quality and diversity of sets. DPPs are
commonly learned from data using maximum likelihood estimation (MLE). While
fitting observed sets well, MLE for DPPs may also assign high likelihoods to
unobserved sets that are far from the true generative distribution of the data.
To address this issue, which reduces the quality of the learned model, we
introduce a novel optimization problem, Contrastive Estimation (CE), which
encodes information about "negative" samples into the basic learning model. CE
is grounded in the successful use of negative information in machine-vision and
language modeling. Depending on the chosen negative distribution (which may be
static or evolve during optimization), CE assumes two different forms, which we
analyze theoretically and experimentally. We evaluate our new model on
real-world datasets; on a challenging dataset, CE learning delivers a
considerable improvement in predictive performance over a DPP learned without
using contrastive information.Comment: Will appear in AISTATS 201
DPPy: Sampling DPPs with Python
Determinantal point processes (DPPs) are specific probability distributions
over clouds of points that are used as models and computational tools across
physics, probability, statistics, and more recently machine learning. Sampling
from DPPs is a challenge and therefore we present DPPy, a Python toolbox that
gathers known exact and approximate sampling algorithms for both finite and
continuous DPPs. The project is hosted on GitHub and equipped with an extensive
documentation.Comment: Code at http://github.com/guilgautier/DPPy/ Documentation at
http://dppy.readthedocs.io
Expectation-Maximization for Learning Determinantal Point Processes
A determinantal point process (DPP) is a probabilistic model of set diversity
compactly parameterized by a positive semi-definite kernel matrix. To fit a DPP
to a given task, we would like to learn the entries of its kernel matrix by
maximizing the log-likelihood of the available data. However, log-likelihood is
non-convex in the entries of the kernel matrix, and this learning problem is
conjectured to be NP-hard. Thus, previous work has instead focused on more
restricted convex learning settings: learning only a single weight for each row
of the kernel matrix, or learning weights for a linear combination of DPPs with
fixed kernel matrices. In this work we propose a novel algorithm for learning
the full kernel matrix. By changing the kernel parameterization from matrix
entries to eigenvalues and eigenvectors, and then lower-bounding the likelihood
in the manner of expectation-maximization algorithms, we obtain an effective
optimization procedure. We test our method on a real-world product
recommendation task, and achieve relative gains of up to 16.5% in test
log-likelihood compared to the naive approach of maximizing likelihood by
projected gradient ascent on the entries of the kernel matrix
Approximately Optimal Subset Selection for Statistical Design and Modelling
We study the problem of optimal subset selection from a set of correlated
random variables. In particular, we consider the associated combinatorial
optimization problem of maximizing the determinant of a symmetric positive
definite matrix that characterizes the chosen subset. This problem arises in
many domains, such as experimental designs, regression modeling, and
environmental statistics. We establish an efficient polynomial-time algorithm
using Determinantal Point Process for approximating the optimal solution to the
problem. We demonstrate the advantages of our methods by presenting
computational results for both synthetic and real data sets.Comment: 14 pages, 3 figures, 1 table; Added examples in statistical desig
Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity
The determinantal point process (DPP) is an elegant probabilistic model of
repulsion with applications in various machine learning tasks including
summarization and search. However, the maximum a posteriori (MAP) inference for
DPP which plays an important role in many applications is NP-hard, and even the
popular greedy algorithm can still be too computationally expensive to be used
in large-scale real-time scenarios. To overcome the computational challenge, in
this paper, we propose a novel algorithm to greatly accelerate the greedy MAP
inference for DPP. In addition, our algorithm also adapts to scenarios where
the repulsion is only required among nearby few items in the result sequence.
We apply the proposed algorithm to generate relevant and diverse
recommendations. Experimental results show that our proposed algorithm is
significantly faster than state-of-the-art competitors, and provides a better
relevance-diversity trade-off on several public datasets, which is also
confirmed in an online A/B test
Learning Nonsymmetric Determinantal Point Processes
Determinantal point processes (DPPs) have attracted substantial attention as
an elegant probabilistic model that captures the balance between quality and
diversity within sets. DPPs are conventionally parameterized by a positive
semi-definite kernel matrix, and this symmetric kernel encodes only repulsive
interactions between items. These so-called symmetric DPPs have significant
expressive power, and have been successfully applied to a variety of machine
learning tasks, including recommendation systems, information retrieval, and
automatic summarization, among many others. Efficient algorithms for learning
symmetric DPPs and sampling from these models have been reasonably well
studied. However, relatively little attention has been given to nonsymmetric
DPPs, which relax the symmetric constraint on the kernel. Nonsymmetric DPPs
allow for both repulsive and attractive item interactions, which can
significantly improve modeling power, resulting in a model that may better fit
for some applications. We present a method that enables a tractable algorithm,
based on maximum likelihood estimation, for learning nonsymmetric DPPs from
data composed of observed subsets. Our method imposes a particular
decomposition of the nonsymmetric kernel that enables such tractable learning
algorithms, which we analyze both theoretically and experimentally. We evaluate
our model on synthetic and real-world datasets, demonstrating improved
predictive performance compared to symmetric DPPs, which have previously shown
strong performance on modeling tasks associated with these datasets.Comment: NeurIPS 201
- …