263 research outputs found

    QuicK-means: Acceleration of K-means by learning a fast transform

    Get PDF
    K-means -- and the celebrated Lloyd algorithm -- is more than the clustering method it was originally designed to be. It has indeed proven pivotal to help increase the speed of many machine learning and data analysis techniques such as indexing, nearest-neighbor search and prediction, data compression, Radial Basis Function networks; its beneficial use has been shown to carry over to the acceleration of kernel machines (when using the Nyström method). Here, we propose a fast extension of K-means, dubbed QuicK-means, that rests on the idea of expressing the matrix of the KK centroids as a product of sparse matrices, a feat made possible by recent results devoted to find approximations of matrices as a product of sparse factors. Using such a decomposition squashes the complexity of the matrix-vector product between the factorized K×DK \times D centroid matrix U\mathbf{U} and any vector from O(KD)\mathcal{O}(K D) to O(AlogA+B)\mathcal{O}(A \log A+B), with A=min(K,D)A=\min (K, D) and B=max(K,D)B=\max (K, D), where DD is the dimension of the training data. This drastic computational saving has a direct impact in the assignment process of a point to a cluster, meaning that it is not only tangible at prediction time, but also at training time, provided the factorization procedure is performed during Lloyd's algorithm. We precisely show that resorting to a factorization step at each iteration does not impair the convergence of the optimization scheme and that, depending on the context, it may entail a reduction of the training time. Finally, we provide discussions and numerical simulations that show the versatility of our computationally-efficient QuicK-means algorithm

    Learning from DPPs via Sampling: Beyond HKPV and symmetry

    Full text link
    Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral meta-algorithm due to Hough, Krishnapur, Peres and Vir\'ag (henceforth HKPV), which is in general time and resource intensive. For DPPs with symmetric kernels, scalable HKPV samplers have been proposed that either first downsample the ground set of items, or force the kernel to be low-rank, using e.g. Nystr\"om-type decompositions. In the present work, we contribute a radically different approach than HKPV. Exploiting the fact that many statistical and learning objectives can be effectively accomplished by only sampling certain key observables of a DPP (so-called linear statistics), we invoke an expression for the Laplace transform of such an observable as a single determinant, which holds in complete generality. Combining traditional low-rank approximation techniques with Laplace inversion algorithms from numerical analysis, we show how to directly approximate the distribution function of a linear statistic of a DPP. This distribution function can then be used in hypothesis testing or to actually sample the linear statistic, as per requirement. Our approach is scalable and applies to very general DPPs, beyond traditional symmetric kernels

    High-order, Dispersionless "Fast-Hybrid" Wave Equation Solver. Part I: O(1)\mathcal{O}(1) Sampling Cost via Incident-Field Windowing and Recentering

    Get PDF
    This paper proposes a frequency/time hybrid integral-equation method for the time dependent wave equation in two and three-dimensional spatial domains. Relying on Fourier Transformation in time, the method utilizes a fixed (time-independent) number of frequency-domain integral-equation solutions to evaluate, with superalgebraically-small errors, time domain solutions for arbitrarily long times. The approach relies on two main elements, namely, 1) A smooth time-windowing methodology that enables accurate band-limited representations for arbitrarily-long time signals, and 2) A novel Fourier transform approach which, in a time-parallel manner and without causing spurious periodicity effects, delivers numerically dispersionless spectrally-accurate solutions. A similar hybrid technique can be obtained on the basis of Laplace transforms instead of Fourier transforms, but we do not consider the Laplace-based method in the present contribution. The algorithm can handle dispersive media, it can tackle complex physical structures, it enables parallelization in time in a straightforward manner, and it allows for time leaping---that is, solution sampling at any given time TT at O(1)\mathcal{O}(1)-bounded sampling cost, for arbitrarily large values of TT, and without requirement of evaluation of the solution at intermediate times. The proposed frequency-time hybridization strategy, which generalizes to any linear partial differential equation in the time domain for which frequency-domain solutions can be obtained (including e.g. the time-domain Maxwell equations), and which is applicable in a wide range of scientific and engineering contexts, provides significant advantages over other available alternatives such as volumetric discretization, time-domain integral equations, and convolution-quadrature approaches.Comment: 33 pages, 8 figures, revised and extended manuscript (and now including direct comparisons to existing CQ and TDIE solver implementations) (Part I of II

    Block subsampled randomized Hadamard transform for low-rank approximation on distributed architectures

    Get PDF
    This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). The block SRHT is expected to outperform well-known dimension reduction maps, including SRHT and Gaussian matrices, on distributed architectures with not too many cores compared to the dimension. We prove that a block SRHT with enough rows is an oblivious subspace embedding, i.e., an approximate isometry for an arbitrary low-dimensional subspace with high probability. Our estimate of the required number of rows is similar to that of the standard SRHT. This suggests that the two transforms should provide the same accuracy of approximation in the algorithms. The block SRHT can be readily incorporated into randomized methods, for instance to compute a low-rank approximation of a large-scale matrix. For completeness, we revisit some common randomized approaches for this problem such as Randomized Singular Value Decomposition and Nyström approximation, with a discussion of their accuracy and implementation on distributed architectures
    corecore