36 research outputs found

    Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling

    We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix \bM instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of \bM. However, we show that when the coordinate blocks are sampled with probability proportional to their determinant, the convergence rate depends solely on the eigenvalue distribution of matrix \bM, and has an analytically tractable form. To do so, we derive a fundamental new expectation formula for determinantal point processes. We show that determinantal sampling allows us to reason about the optimal subset size of blocks in terms of the spectrum of \bM. Additionally, we provide a numerical evaluation of our analysis, demonstrating cases where determinantal sampling is superior or on par with uniform sampling

    Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations

    The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix. RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional arithmetic operations, and it can be implemented with just a few lines of code. The method is particularly useful for approximating a kernel matrix. This paper offers a thorough new investigation of the empirical and theoretical behavior of this fundamental algorithm. For matrix approximation problems that arise in scientific machine learning, experiments show that RPCholesky matches or beats the performance of alternative algorithms. Moreover, RPCholesky provably returns low-rank approximations that are nearly optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly support its use in scientific computing and machine learning applications.Comment: 38 pages, 4 figure

    Learning from DPPs via Sampling: Beyond HKPV and symmetry

    Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral meta-algorithm due to Hough, Krishnapur, Peres and Vir\'ag (henceforth HKPV), which is in general time and resource intensive. For DPPs with symmetric kernels, scalable HKPV samplers have been proposed that either first downsample the ground set of items, or force the kernel to be low-rank, using e.g. Nystr\"om-type decompositions. In the present work, we contribute a radically different approach than HKPV. Exploiting the fact that many statistical and learning objectives can be effectively accomplished by only sampling certain key observables of a DPP (so-called linear statistics), we invoke an expression for the Laplace transform of such an observable as a single determinant, which holds in complete generality. Combining traditional low-rank approximation techniques with Laplace inversion algorithms from numerical analysis, we show how to directly approximate the distribution function of a linear statistic of a DPP. This distribution function can then be used in hypothesis testing or to actually sample the linear statistic, as per requirement. Our approach is scalable and applies to very general DPPs, beyond traditional symmetric kernels

    Polynomial Tensor Sketch for Element-wise Function of Low-Rank Matrix

    This paper studies how to sketch element-wise functions of low-rank matrices. Formally, given low-rank matrix A = [Aij] and scalar non-linear function f, we aim for finding an approximated low-rank representation of the (possibly high-rank) matrix [f(Aij)]. To this end, we propose an efficient sketching-based algorithm whose complexity is significantly lower than the number of entries of A, i.e., it runs without accessing all entries of [f(Aij)] explicitly. The main idea underlying our method is to combine a polynomial approximation of f with the existing tensor sketch scheme for approximating monomials of entries of A. To balance the errors of the two approximation components in an optimal manner, we propose a novel regression formula to find polynomial coefficients given A and f. In particular, we utilize a coreset-based regression with a rigorous approximation guarantee. Finally, we demonstrate the applicability and superiority of the proposed scheme under various machine learning tasks

    Efficient Algorithms and Error Analysis for the Modified Nystrom Method

    Many kernel methods suffer from high time and space complexities and are thus prohibitive in big-data applications. To tackle the computational challenge, the Nystr\"om method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nystr\"om method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nystr\"om method called the modified Nystr\"om method has demonstrated significant improvement over the standard Nystr\"om method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nystr\"om method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the state-of-the-art algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nystr\"om method is exact under certain conditions, and we establish a lower error bound for the modified Nystr\"om method.Comment: 9-page paper plus appendix. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. JMLR: W&CP volume 3

    Kernel quadrature with randomly pivoted Cholesky

    This paper presents new quadrature rules for functions in a reproducing kernel Hilbert space using nodes drawn by a sampling algorithm known as randomly pivoted Cholesky. The resulting computational procedure compares favorably to previous kernel quadrature methods, which either achieve low accuracy or require solving a computationally challenging sampling problem. Theoretical and numerical results show that randomly pivoted Cholesky is fast and achieves comparable quadrature error rates to more computationally expensive quadrature schemes based on continuous volume sampling, thinning, and recombination. Randomly pivoted Cholesky is easily adapted to complicated geometries with arbitrary kernels, unlocking new potential for kernel quadrature.Comment: 19 pages, 3 figures; NeurIPS 2023 (spotlight), camera-ready versio

    Sampling from a k-DPP without looking at all items

    International audienceDeterminantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size k, our goal is to sample k out of n items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. k-DPP). Existing k-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all n items, making it infeasible for large datasets. A naïve heuristic addressing this problem is to uniformly subsample a fraction of the data and perform k-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop α-DPP, an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of k items, while ensuring that this set is drawn exactly from the target distribution defined on all n items. We show empirically that our algorithm produces a k-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art. Our implementation of α-DPP is provided at https://github.com/guilgautier/DPPy/