111 research outputs found
Learning Determinantal Point Processes
Determinantal point processes (DPPs), which arise in random matrix theory and
quantum physics, are natural models for subset selection problems where
diversity is preferred. Among many remarkable properties, DPPs offer tractable
algorithms for exact inference, including computing marginal probabilities and
sampling; however, an important open question has been how to learn a DPP from
labeled training data. In this paper we propose a natural feature-based
parameterization of conditional DPPs, and show how it leads to a convex and
efficient learning formulation. We analyze the relationship between our model
and binary Markov random fields with repulsive potentials, which are
qualitatively similar but computationally intractable. Finally, we apply our
approach to the task of extractive summarization, where the goal is to choose a
small subset of sentences conveying the most important information from a set
of documents. In this task there is a fundamental tradeoff between sentences
that are highly relevant to the collection as a whole, and sentences that are
diverse and not repetitive. Our parameterization allows us to naturally balance
these two characteristics. We evaluate our system on data from the DUC 2003/04
multi-document summarization task, achieving state-of-the-art results
Large-Margin Determinantal Point Processes
Determinantal point processes (DPPs) offer a powerful approach to modeling
diversity in many applications where the goal is to select a diverse subset. We
study the problem of learning the parameters (the kernel matrix) of a DPP from
labeled training data. We make two contributions. First, we show how to
reparameterize a DPP's kernel matrix with multiple kernel functions, thus
enhancing modeling flexibility. Second, we propose a novel parameter estimation
technique based on the principle of large margin separation. In contrast to the
state-of-the-art method of maximum likelihood estimation, our large-margin loss
function explicitly models errors in selecting the target subsets, and it can
be customized to trade off different types of errors (precision vs. recall).
Extensive empirical studies validate our contributions, including applications
on challenging document and video summarization, where flexibility in modeling
the kernel matrix and balancing different errors is indispensable.Comment: 15 page
Approximate Inference for Determinantal Point Processes
In this thesis we explore a probabilistic model that is well-suited to a variety of subset selection tasks: the determinantal point process (DPP). DPPs were originally developed in the physics community to describe the repulsive interactions of fermions. More recently, they have been applied to machine learning problems such as search diversification and document summarization, which can be cast as subset selection tasks. A challenge, however, is scaling such DPP-based methods to the size of the datasets of interest to this community, and developing approximations for DPP inference tasks whose exact computation is prohibitively expensive.
A DPP defines a probability distribution over all subsets of a ground set of items. Consider the inference tasks common to probabilistic models, which include normalizing, marginalizing, conditioning, sampling, estimating the mode, and maximizing likelihood. For DPPs, exactly computing the quantities necessary for the first four of these tasks requires time cubic in the number of items or features of the items. In this thesis, we propose a means of making these four tasks tractable even in the realm where the number of items and the number of features is large. Specifically, we analyze the impact of randomly projecting the features down to a lower-dimensional space and show that the variational distance between the resulting DPP and the original is bounded. In addition to expanding the circumstances in which these first four tasks are tractable, we also tackle the other two tasks, the first of which is known to be NP-hard (with no PTAS) and the second of which is conjectured to be NP-hard. For mode estimation, we build on submodular maximization techniques to develop an algorithm with a multiplicative approximation guarantee. For likelihood maximization, we exploit the generative process associated with DPP sampling to derive an expectation-maximization (EM) algorithm. We experimentally verify the practicality of all the techniques that we develop, testing them on applications such as news and research summarization, political candidate comparison, and product recommendation
- …