129 research outputs found
Markov Properties of Discrete Determinantal Point Processes
Determinantal point processes (DPPs) are probabilistic models for repulsion.
When used to represent the occurrence of random subsets of a finite base set,
DPPs allow to model global negative associations in a mathematically elegant
and direct way. Discrete DPPs have become popular and computationally tractable
models for solving several machine learning tasks that require the selection of
diverse objects, and have been successfully applied in numerous real-life
problems. Despite their popularity, the statistical properties of such models
have not been adequately explored. In this note, we derive the Markov
properties of discrete DPPs and show how they can be expressed using graphical
models.Comment: 9 pages, 1 figur
Advances in the Theory of Determinantal Point Processes
The theory of determinantal point processes has its roots in work in mathematical physics in the 1960s, but it is only in recent years that it has been developed beyond several specific examples. While there is a rich probabilistic theory, there are still many open questions in this area, and its applications to statistics and machine learning are still largely unexplored.
Our contributions are threefold. First, we develop the theory of determinantal point processes on a finite set. While there is a small body of literature on this topic, we offer a new perspective that allows us to unify and extend previous results.
Second, we investigate several new kernels. We describe these processes explicitly, and investigate the new discrete distribution which arises from our computations.
Finally, we show how the parameters of a determinantal point process over a finite ground set with a symmetric kernel may be computed if infinite samples are available. This algorithm is a vital step towards the use of determinantal point processes as a general statistical model
Efficient Failure Pattern Identification of Predictive Algorithms
Given a (machine learning) classifier and a collection of unlabeled data, how
can we efficiently identify misclassification patterns presented in this
dataset? To address this problem, we propose a human-machine collaborative
framework that consists of a team of human annotators and a sequential
recommendation algorithm. The recommendation algorithm is conceptualized as a
stochastic sampler that, in each round, queries the annotators a subset of
samples for their true labels and obtains the feedback information on whether
the samples are misclassified. The sampling mechanism needs to balance between
discovering new patterns of misclassification (exploration) and confirming the
potential patterns of classification (exploitation). We construct a
determinantal point process, whose intensity balances the
exploration-exploitation trade-off through the weighted update of the posterior
at each round to form the generator of the stochastic sampler. The numerical
results empirically demonstrate the competitive performance of our framework on
multiple datasets at various signal-to-noise ratios.Comment: 19 pages, Accepted for UAI202
Intertwining wavelets or Multiresolution analysis on graphs through random forests
We propose a new method for performing multiscale analysis of functions
defined on the vertices of a finite connected weighted graph. Our approach
relies on a random spanning forest to downsample the set of vertices, and on
approximate solutions of Markov intertwining relation to provide a subgraph
structure and a filter bank leading to a wavelet basis of the set of functions.
Our construction involves two parameters q and q'. The first one controls the
mean number of kept vertices in the downsampling, while the second one is a
tuning parameter between space localization and frequency localization. We
provide an explicit reconstruction formula, bounds on the reconstruction
operator norm and on the error in the intertwining relation, and a Jackson-like
inequality. These bounds lead to recommend a way to choose the parameters q and
q'. We illustrate the method by numerical experiments.Comment: 39 pages, 12 figure
- …