1,357 research outputs found
Least squares approximations of measures via geometric condition numbers
For a probability measure on a real separable Hilbert space, we are
interested in "volume-based" approximations of the d-dimensional least squares
error of it, i.e., least squares error with respect to a best fit d-dimensional
affine subspace. Such approximations are given by averaging real-valued
multivariate functions which are typically scalings of squared (d+1)-volumes of
(d+1)-simplices. Specifically, we show that such averages are comparable to the
square of the d-dimensional least squares error of that measure, where the
comparison depends on a simple quantitative geometric property of it. This
result is a higher dimensional generalization of the elementary fact that the
double integral of the squared distances between points is proportional to the
variance of measure. We relate our work to two recent algorithms, one for
clustering affine subspaces and the other for Monte-Carlo SVD based on volume
sampling
Approximation and Streaming Algorithms for Projective Clustering via Random Projections
Let be a set of points in . In the projective
clustering problem, given and norm , we have to
compute a set of -dimensional flats such that is minimized; here
represents the (Euclidean) distance of to the closest flat in
. We let denote the minimal value and interpret
to be . When and
and , the problem corresponds to the -median, -mean and the
-center clustering problems respectively.
For every , and , we show that the
orthogonal projection of onto a randomly chosen flat of dimension
will -approximate
. This result combines the concepts of geometric coresets and
subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence,
an orthogonal projection of to an dimensional randomly chosen subspace
-approximates projective clusterings for every and
simultaneously. Note that the dimension of this subspace is independent of the
number of clusters~.
Using this dimension reduction result, we obtain new approximation and
streaming algorithms for projective clustering problems. For example, given a
stream of points, we show how to compute an -approximate
projective clustering for every and simultaneously using only
space. Compared to
standard streaming algorithms with space requirement, our approach
is a significant improvement when the number of input points and their
dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015
Enhanced negative type for finite metric trees
Finite metric trees are known to have strict 1-negative type. In this paper
we introduce a new family of inequalities that quantify the extent of the
"strictness" of the 1-negative type inequalities for finite metric trees. These
inequalities of "enhanced 1-negative type" are sufficiently strong to imply
that any given finite metric tree must have strict p-negative type for all
values of p in an open interval that contains the number 1. Moreover, these
open intervals can be characterized purely in terms of the unordered
distribution of edge weights that determine the path metric on the particular
tree, and are therefore largely independent of the tree's internal geometry.
From these calculations we are able to extract a new non linear technique for
improving lower bounds on the maximal p-negative type of certain finite metric
spaces. Some pathological examples are also considered in order to stress
certain technical points.Comment: 35 pages, no figures. This is the final version of this paper sans
diagrams. Please note the corrected statement of Theorem 4.16 (and hence
inequality (1)). A scaling factor was omitted in Version #
Precision-Recall Curves Using Information Divergence Frontiers
Despite the tremendous progress in the estimation of generative models, the
development of tools for diagnosing their failures and assessing their
performance has advanced at a much slower pace. Recent developments have
investigated metrics that quantify which parts of the true distribution is
modeled well, and, on the contrary, what the model fails to capture, akin to
precision and recall in information retrieval. In this paper, we present a
general evaluation framework for generative models that measures the trade-off
between precision and recall using R\'enyi divergences. Our framework provides
a novel perspective on existing techniques and extends them to more general
domains. As a key advantage, this formulation encompasses both continuous and
discrete models and allows for the design of efficient algorithms that do not
have to quantize the data. We further analyze the biases of the approximations
used in practice.Comment: Updated to the AISTATS 2020 versio
- …