Search CORE

1,357 research outputs found

Least squares approximations of measures via geometric condition numbers

Author: Agarwal
David
Deshpande
Gilad Lerman
Govindu
J. Tyler Whitehouse
Yafaev
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

For a probability measure on a real separable Hilbert space, we are interested in "volume-based" approximations of the d-dimensional least squares error of it, i.e., least squares error with respect to a best fit d-dimensional affine subspace. Such approximations are given by averaging real-valued multivariate functions which are typically scalings of squared (d+1)-volumes of (d+1)-simplices. Specifically, we show that such averages are comparable to the square of the d-dimensional least squares error of that measure, where the comparison depends on a simple quantitative geometric property of it. This result is a higher dimensional generalization of the elementary fact that the double integral of the squared distances between points is proportional to the variance of measure. We relate our work to two recent algorithms, one for clustering affine subspaces and the other for Monte-Carlo SVD based on volume sampling

arXiv.org e-Print Archive

CiteSeerX

Crossref

Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Author: Kerber Michael
Raghvendra Sharath
Publication venue
Publication date: 08/07/2014
Field of study

Let

P

be a set of

n

points in

\mathbb{R}^d

. In the projective clustering problem, given

k, q

and norm

\rho \in [1,\infty]

, we have to compute a set

\mathcal{F}

k

q

-dimensional flats such that

(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho}

is minimized; here

d(p, \mathcal{F})

represents the (Euclidean) distance of

p

to the closest flat in

\mathcal{F}

. We let

f_k^q(P,\rho)

denote the minimal value and interpret

f_k^q(P,\infty)

to be

\max_{r\in P}d(r, \mathcal{F})

. When

\rho=1,2

and

\infty

and

q=0

, the problem corresponds to the

k

-median,

k

-mean and the

k

-center clustering problems respectively. For every

0 < \epsilon < 1

S\subset P

and

\rho \ge 1

, we show that the orthogonal projection of

P

onto a randomly chosen flat of dimension

O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n)

will

\epsilon

-approximate

f_1^q(S,\rho)

. This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of

P

to an

O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n)

dimensional randomly chosen subspace

\epsilon

-approximates projective clusterings for every

k

and

\rho

simultaneously. Note that the dimension of this subspace is independent of the number of clusters~

k

. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of

n

points, we show how to compute an

\epsilon

-approximate projective clustering for every

k

and

\rho

simultaneously using only

O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n)

space. Compared to standard streaming algorithms with

\Omega(kd)

space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Enhanced negative type for finite metric trees

Author: Ailon
Anthony Weston
Bartal
Benyamini
Bourgain
Bretagnolle
Cayley
Charikar
Deza
Dranishnikov
Enflo
Enflo
Fakcharoenphol
Gromov
Hjorth
Hjorth
Ian Doust
Junge
Koldobsky
Lafont
Lennard
Lennard
Maurey
Menger
Naor
Nowak
Roe
Schoenberg
Schoenberg
Semple
Weber
Wells
Weston
Yu
Publication venue: 'Elsevier BV'
Publication date: 25/03/2008
Field of study

Finite metric trees are known to have strict 1-negative type. In this paper we introduce a new family of inequalities that quantify the extent of the "strictness" of the 1-negative type inequalities for finite metric trees. These inequalities of "enhanced 1-negative type" are sufficiently strong to imply that any given finite metric tree must have strict p-negative type for all values of p in an open interval that contains the number 1. Moreover, these open intervals can be characterized purely in terms of the unordered distribution of edge weights that determine the path metric on the particular tree, and are therefore largely independent of the tree's internal geometry. From these calculations we are able to extract a new non linear technique for improving lower bounds on the maximal p-negative type of certain finite metric spaces. Some pathological examples are also considered in order to stress certain technical points.Comment: 35 pages, no figures. This is the final version of this paper sans diagrams. Please note the corrected statement of Theorem 4.16 (and hence inequality (1)). A scaling factor was omitted in Version #

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Precision-Recall Curves Using Information Divergence Frontiers

Author: Djolonga Josip
Lucic Mario
Cuturi Marco
Bachem Olivier
Bousquet Olivier
Gelly Sylvain
Publication venue
Publication date: 01/03/2003
Field of study

Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace. Recent developments have investigated metrics that quantify which parts of the true distribution is modeled well, and, on the contrary, what the model fails to capture, akin to precision and recall in information retrieval. In this paper, we present a general evaluation framework for generative models that measures the trade-off between precision and recall using R\'enyi divergences. Our framework provides a novel perspective on existing techniques and extends them to more general domains. As a key advantage, this formulation encompasses both continuous and discrete models and allows for the design of efficient algorithms that do not have to quantize the data. We further analyze the biases of the approximations used in practice.Comment: Updated to the AISTATS 2020 versio

arXiv.org e-Print Archive

Washington and Lee University School of Law

University of Richmond