36 research outputs found
Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling
We analyze the convergence rate of the randomized Newton-like method
introduced by Qu et. al. (2016) for smooth and convex objectives, which uses
random coordinate blocks of a Hessian-over-approximation matrix \bM instead
of the true Hessian. The convergence analysis of the algorithm is challenging
because of its complex dependence on the structure of \bM. However, we show
that when the coordinate blocks are sampled with probability proportional to
their determinant, the convergence rate depends solely on the eigenvalue
distribution of matrix \bM, and has an analytically tractable form. To do so,
we derive a fundamental new expectation formula for determinantal point
processes. We show that determinantal sampling allows us to reason about the
optimal subset size of blocks in terms of the spectrum of \bM. Additionally,
we provide a numerical evaluation of our analysis, demonstrating cases where
determinantal sampling is superior or on par with uniform sampling
Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations
The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a
factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix.
RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional
arithmetic operations, and it can be implemented with just a few lines of code.
The method is particularly useful for approximating a kernel matrix.
This paper offers a thorough new investigation of the empirical and
theoretical behavior of this fundamental algorithm. For matrix approximation
problems that arise in scientific machine learning, experiments show that
RPCholesky matches or beats the performance of alternative algorithms.
Moreover, RPCholesky provably returns low-rank approximations that are nearly
optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly
support its use in scientific computing and machine learning applications.Comment: 38 pages, 4 figure
Learning from DPPs via Sampling: Beyond HKPV and symmetry
Determinantal point processes (DPPs) have become a significant tool for
recommendation systems, feature selection, or summary extraction, harnessing
the intrinsic ability of these probabilistic models to facilitate sample
diversity. The ability to sample from DPPs is paramount to the empirical
investigation of these models. Most exact samplers are variants of a spectral
meta-algorithm due to Hough, Krishnapur, Peres and Vir\'ag (henceforth HKPV),
which is in general time and resource intensive. For DPPs with symmetric
kernels, scalable HKPV samplers have been proposed that either first downsample
the ground set of items, or force the kernel to be low-rank, using e.g.
Nystr\"om-type decompositions.
In the present work, we contribute a radically different approach than HKPV.
Exploiting the fact that many statistical and learning objectives can be
effectively accomplished by only sampling certain key observables of a DPP
(so-called linear statistics), we invoke an expression for the Laplace
transform of such an observable as a single determinant, which holds in
complete generality. Combining traditional low-rank approximation techniques
with Laplace inversion algorithms from numerical analysis, we show how to
directly approximate the distribution function of a linear statistic of a DPP.
This distribution function can then be used in hypothesis testing or to
actually sample the linear statistic, as per requirement. Our approach is
scalable and applies to very general DPPs, beyond traditional symmetric
kernels
Sampling from a -DPP without looking at all items
Determinantal point processes (DPPs) are a useful probabilistic model for
selecting a small diverse subset out of a large collection of items, with
applications in summarization, stochastic optimization, active learning and
more. Given a kernel function and a subset size , our goal is to sample
out of items with probability proportional to the determinant of the kernel
matrix induced by the subset (a.k.a. -DPP). Existing -DPP sampling
algorithms require an expensive preprocessing step which involves multiple
passes over all items, making it infeasible for large datasets. A na\"ive
heuristic addressing this problem is to uniformly subsample a fraction of the
data and perform -DPP sampling only on those items, however this method
offers no guarantee that the produced sample will even approximately resemble
the target distribution over the original dataset. In this paper, we develop an
algorithm which adaptively builds a sufficiently large uniform sample of data
that is then used to efficiently generate a smaller set of items, while
ensuring that this set is drawn exactly from the target distribution defined on
all items. We show empirically that our algorithm produces a -DPP sample
after observing only a small fraction of all elements, leading to several
orders of magnitude faster performance compared to the state-of-the-art
Polynomial Tensor Sketch for Element-wise Function of Low-Rank Matrix
This paper studies how to sketch element-wise functions of low-rank matrices.
Formally, given low-rank matrix A = [Aij] and scalar non-linear function f, we
aim for finding an approximated low-rank representation of the (possibly
high-rank) matrix [f(Aij)]. To this end, we propose an efficient
sketching-based algorithm whose complexity is significantly lower than the
number of entries of A, i.e., it runs without accessing all entries of [f(Aij)]
explicitly. The main idea underlying our method is to combine a polynomial
approximation of f with the existing tensor sketch scheme for approximating
monomials of entries of A. To balance the errors of the two approximation
components in an optimal manner, we propose a novel regression formula to find
polynomial coefficients given A and f. In particular, we utilize a
coreset-based regression with a rigorous approximation guarantee. Finally, we
demonstrate the applicability and superiority of the proposed scheme under
various machine learning tasks
Efficient Algorithms and Error Analysis for the Modified Nystrom Method
Many kernel methods suffer from high time and space complexities and are thus
prohibitive in big-data applications. To tackle the computational challenge,
the Nystr\"om method has been extensively used to reduce time and space
complexities by sacrificing some accuracy. The Nystr\"om method speedups
computation by constructing an approximation of the kernel matrix using only a
few columns of the matrix. Recently, a variant of the Nystr\"om method called
the modified Nystr\"om method has demonstrated significant improvement over the
standard Nystr\"om method in approximation accuracy, both theoretically and
empirically.
In this paper, we propose two algorithms that make the modified Nystr\"om
method practical. First, we devise a simple column selection algorithm with a
provable error bound. Our algorithm is more efficient and easier to implement
than and nearly as accurate as the state-of-the-art algorithm. Second, with the
selected columns at hand, we propose an algorithm that computes the
approximation in lower time complexity than the approach in the previous work.
Furthermore, we prove that the modified Nystr\"om method is exact under certain
conditions, and we establish a lower error bound for the modified Nystr\"om
method.Comment: 9-page paper plus appendix. In Proceedings of the 17th International
Conference on Artificial Intelligence and Statistics (AISTATS) 2014,
Reykjavik, Iceland. JMLR: W&CP volume 3
Kernel quadrature with randomly pivoted Cholesky
This paper presents new quadrature rules for functions in a reproducing
kernel Hilbert space using nodes drawn by a sampling algorithm known as
randomly pivoted Cholesky. The resulting computational procedure compares
favorably to previous kernel quadrature methods, which either achieve low
accuracy or require solving a computationally challenging sampling problem.
Theoretical and numerical results show that randomly pivoted Cholesky is fast
and achieves comparable quadrature error rates to more computationally
expensive quadrature schemes based on continuous volume sampling, thinning, and
recombination. Randomly pivoted Cholesky is easily adapted to complicated
geometries with arbitrary kernels, unlocking new potential for kernel
quadrature.Comment: 19 pages, 3 figures; NeurIPS 2023 (spotlight), camera-ready versio
Sampling from a k-DPP without looking at all items
International audienceDeterminantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size k, our goal is to sample k out of n items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. k-DPP). Existing k-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all n items, making it infeasible for large datasets. A naïve heuristic addressing this problem is to uniformly subsample a fraction of the data and perform k-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop α-DPP, an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of k items, while ensuring that this set is drawn exactly from the target distribution defined on all n items. We show empirically that our algorithm produces a k-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art. Our implementation of α-DPP is provided at https://github.com/guilgautier/DPPy/