Search CORE

6,226 research outputs found

Open problems in symmetry analysis

Author: Clarkson Peter
Mansfield Elizabeth L.
Publication venue: 'American Mathematical Society (AMS)'
Publication date: 10/01/2002
Field of study

Input Sparsity and Hardness for Robust Subspace Approximation

Author: Clarkson Kenneth L.
Woodruff David P.
Publication venue
Publication date: 20/10/2015
Field of study

In the subspace approximation problem, we seek a k-dimensional subspace F of R^d that minimizes the sum of p-th powers of Euclidean distances to a given set of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing sum_i dist(a_i,F)^p,we may wish to minimize sum_i M(dist(a_i,F)) for some loss function M(), for example, M-Estimators, which include the Huber and Tukey loss functions. Such subspaces provide alternatives to the singular value decomposition (SVD), which is the p=2 case, finding such an F that minimizes the sum of squares of distances. For p in [1,2), and for typical M-Estimators, the minimizing

F

gives a solution that is more robust to outliers than that provided by the SVD. We give several algorithmic and hardness results for these robust subspace approximation problems. We think of the n points as forming an n x d matrix A, and letting nnz(A) denote the number of non-zero entries of A. Our results hold for p in [1,2). We use poly(n) to denote n^{O(1)} as n -> infty. We obtain: (1) For minimizing sum_i dist(a_i,F)^p, we give an algorithm running in O(nnz(A) + (n+d)poly(k/eps) + exp(poly(k/eps))), (2) we show that the problem of minimizing sum_i dist(a_i, F)^p is NP-hard, even to output a (1+1/poly(d))-approximation, answering a question of Kannan and Vempala, and complementing prior results which held for p >2, (3) For loss functions for a wide class of M-Estimators, we give a problem-size reduction: for a parameter K=(log n)^{O(log k)}, our reduction takes O(nnz(A) log n + (n+d) poly(K/eps)) time to reduce the problem to a constrained version involving matrices whose dimensions are poly(K eps^{-1} log n). We also give bicriteria solutions, (4) Our techniques lead to the first O(nnz(A) + poly(d/eps)) time algorithms for (1+eps)-approximate regression for a wide class of convex M-Estimators.Comment: paper appeared in FOCS, 201

arXiv.org e-Print Archive

Crossref

Self-improving Algorithms for Coordinate-wise Maxima

Author: Clarkson Kenneth L.
Mulzer Wolfgang
Seshadhri C.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2012
Field of study

Computing the coordinate-wise maxima of a planar point set is a classic and well-studied problem in computational geometry. We give an algorithm for this problem in the \emph{self-improving setting}. We have

n

(unknown) independent distributions \cD_1, \cD_2, ..., \cD_n of planar points. An input pointset

(p_1, p_2, ..., p_n)

is generated by taking an independent sample

p_i

from each \cD_i, so the input distribution \cD is the product \prod_i \cD_i. A self-improving algorithm repeatedly gets input sets from the distribution \cD (which is \emph{a priori} unknown) and tries to optimize its running time for \cD. Our algorithm uses the first few inputs to learn salient features of the distribution, and then becomes an optimal algorithm for distribution \cD. Let \OPT_\cD denote the expected depth of an \emph{optimal} linear comparison tree computing the maxima for distribution \cD. Our algorithm eventually has an expected running time of O(\text{OPT}_\cD + n), even though it did not know \cD to begin with. Our result requires new tools to understand linear comparison trees for computing maxima. We show how to convert general linear comparison trees to very restricted versions, which can then be related to the running time of our algorithm. An interesting feature of our algorithm is an interleaved search, where the algorithm tries to determine the likeliest point to be maximal with minimal computation. This allows the running time to be truly optimal for the distribution \cD.Comment: To appear in Symposium of Computational Geometry 2012 (17 pages, 2 figures

arXiv.org e-Print Archive

CiteSeerX

Crossref

ROC and the bounds on tail probabilities via theorems of Dubins and F. Riesz

Author: Clarkson Eric
Denny J. L.
Shepp Larry
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 03/03/2009
Field of study

For independent

X

and

Y

in the inequality

P(X\leq Y+\mu)

, we give sharp lower bounds for unimodal distributions having finite variance, and sharp upper bounds assuming symmetric densities bounded by a finite constant. The lower bounds depend on a result of Dubins about extreme points and the upper bounds depend on a symmetric rearrangement theorem of F. Riesz. The inequality was motivated by medical imaging: find bounds on the area under the Receiver Operating Characteristic curve (ROC).Comment: Published in at http://dx.doi.org/10.1214/08-AAP536 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref