1,357 research outputs found

    Least squares approximations of measures via geometric condition numbers

    Full text link
    For a probability measure on a real separable Hilbert space, we are interested in "volume-based" approximations of the d-dimensional least squares error of it, i.e., least squares error with respect to a best fit d-dimensional affine subspace. Such approximations are given by averaging real-valued multivariate functions which are typically scalings of squared (d+1)-volumes of (d+1)-simplices. Specifically, we show that such averages are comparable to the square of the d-dimensional least squares error of that measure, where the comparison depends on a simple quantitative geometric property of it. This result is a higher dimensional generalization of the elementary fact that the double integral of the squared distances between points is proportional to the variance of measure. We relate our work to two recent algorithms, one for clustering affine subspaces and the other for Monte-Carlo SVD based on volume sampling

    Approximation and Streaming Algorithms for Projective Clustering via Random Projections

    Full text link
    Let PP be a set of nn points in Rd\mathbb{R}^d. In the projective clustering problem, given k,qk, q and norm ρ[1,]\rho \in [1,\infty], we have to compute a set F\mathcal{F} of kk qq-dimensional flats such that (pPd(p,F)ρ)1/ρ(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho} is minimized; here d(p,F)d(p, \mathcal{F}) represents the (Euclidean) distance of pp to the closest flat in F\mathcal{F}. We let fkq(P,ρ)f_k^q(P,\rho) denote the minimal value and interpret fkq(P,)f_k^q(P,\infty) to be maxrPd(r,F)\max_{r\in P}d(r, \mathcal{F}). When ρ=1,2\rho=1,2 and \infty and q=0q=0, the problem corresponds to the kk-median, kk-mean and the kk-center clustering problems respectively. For every 0<ϵ<10 < \epsilon < 1, SPS\subset P and ρ1\rho \ge 1, we show that the orthogonal projection of PP onto a randomly chosen flat of dimension O(((q+1)2log(1/ϵ)/ϵ3)logn)O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n) will ϵ\epsilon-approximate f1q(S,ρ)f_1^q(S,\rho). This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of PP to an O(((q+1)2log((q+1)/ϵ)/ϵ3)logn)O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n) dimensional randomly chosen subspace ϵ\epsilon-approximates projective clusterings for every kk and ρ\rho simultaneously. Note that the dimension of this subspace is independent of the number of clusters~kk. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of nn points, we show how to compute an ϵ\epsilon-approximate projective clustering for every kk and ρ\rho simultaneously using only O((n+d)((q+1)2log((q+1)/ϵ))/ϵ3logn)O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n) space. Compared to standard streaming algorithms with Ω(kd)\Omega(kd) space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

    Enhanced negative type for finite metric trees

    Get PDF
    Finite metric trees are known to have strict 1-negative type. In this paper we introduce a new family of inequalities that quantify the extent of the "strictness" of the 1-negative type inequalities for finite metric trees. These inequalities of "enhanced 1-negative type" are sufficiently strong to imply that any given finite metric tree must have strict p-negative type for all values of p in an open interval that contains the number 1. Moreover, these open intervals can be characterized purely in terms of the unordered distribution of edge weights that determine the path metric on the particular tree, and are therefore largely independent of the tree's internal geometry. From these calculations we are able to extract a new non linear technique for improving lower bounds on the maximal p-negative type of certain finite metric spaces. Some pathological examples are also considered in order to stress certain technical points.Comment: 35 pages, no figures. This is the final version of this paper sans diagrams. Please note the corrected statement of Theorem 4.16 (and hence inequality (1)). A scaling factor was omitted in Version #

    Precision-Recall Curves Using Information Divergence Frontiers

    Get PDF
    Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace. Recent developments have investigated metrics that quantify which parts of the true distribution is modeled well, and, on the contrary, what the model fails to capture, akin to precision and recall in information retrieval. In this paper, we present a general evaluation framework for generative models that measures the trade-off between precision and recall using R\'enyi divergences. Our framework provides a novel perspective on existing techniques and extends them to more general domains. As a key advantage, this formulation encompasses both continuous and discrete models and allows for the design of efficient algorithms that do not have to quantize the data. We further analyze the biases of the approximations used in practice.Comment: Updated to the AISTATS 2020 versio
    corecore