6 research outputs found

    Approximate Nearest Neighbor Search Amid Higher-Dimensional Flats

    Get PDF
    We consider the Approximate Nearest Neighbor (ANN) problem where the input set consists of n k-flats in the Euclidean Rd, for any fixed parameters k 0 is another prespecified parameter. We present an algorithm that achieves this task with n^{k+1}(log(n)/epsilon)^O(1) storage and preprocessing (where the constant of proportionality in the big-O notation depends on d), and can answer a query in O(polylog(n)) time (where the power of the logarithm depends on d and k). In particular, we need only near-quadratic storage to answer ANN queries amidst a set of n lines in any fixed-dimensional Euclidean space. As a by-product, our approach also yields an algorithm, with similar performance bounds, for answering exact nearest neighbor queries amidst k-flats with respect to any polyhedral distance function. Our results are more general, in that they also provide a tradeoff between storage and query time

    Approximate Sparse Linear Regression

    Get PDF
    In the Sparse Linear Regression (SLR) problem, given a d x n matrix M and a d-dimensional query q, the goal is to compute a k-sparse n-dimensional vector tau such that the error ||M tau - q|| is minimized. This problem is equivalent to the following geometric problem: given a set P of n points and a query point q in d dimensions, find the closest k-dimensional subspace to q, that is spanned by a subset of k points in P. In this paper, we present data-structures/algorithms and conditional lower bounds for several variants of this problem (such as finding the closest induced k dimensional flat/simplex instead of a subspace). In particular, we present approximation algorithms for the online variants of the above problems with query time O~(n^{k-1}), which are of interest in the "low sparsity regime" where k is small, e.g., 2 or 3. For k=d, this matches, up to polylogarithmic factors, the lower bound that relies on the affinely degenerate conjecture (i.e., deciding if n points in R^d contains d+1 points contained in a hyperplane takes Omega(n^d) time). Moreover, our algorithms involve formulating and solving several geometric subproblems, which we believe to be of independent interest

    Sparse Regression via Range Counting

    Get PDF
    The sparse regression problem, also known as best subset selection problem, can be cast as follows: Given a set S of n points in ?^d, a point y? ?^d, and an integer 2 ? k ? d, find an affine combination of at most k points of S that is nearest to y. We describe a O(n^{k-1} log^{d-k+2} n)-time randomized (1+?)-approximation algorithm for this problem with d and ? constant. This is the first algorithm for this problem running in time o(n^k). Its running time is similar to the query time of a data structure recently proposed by Har-Peled, Indyk, and Mahabadi (ICALP\u2718), while not requiring any preprocessing. Up to polylogarithmic factors, it matches a conditional lower bound relying on a conjecture about affine degeneracy testing. In the special case where k = d = O(1), we provide a simple O_?(n^{d-1+?})-time deterministic exact algorithm, for any ? > 0. Finally, we show how to adapt the approximation algorithm for the sparse linear regression and sparse convex regression problems with the same running time, up to polylogarithmic factors

    Approximate Nearest-Neighbor Search for Line Segments

    Get PDF
    Approximate nearest-neighbor search is a fundamental algorithmic problem that continues to inspire study due its essential role in numerous contexts. In contrast to most prior work, which has focused on point sets, we consider nearest-neighbor queries against a set of line segments in Rd\mathbb{R}^d, for constant dimension dd. Given a set SS of nn disjoint line segments in Rd\mathbb{R}^d and an error parameter ε>0\varepsilon > 0, the objective is to build a data structure such that for any query point qq, it is possible to return a line segment whose Euclidean distance from qq is at most (1+ε)(1+\varepsilon) times the distance from qq to its nearest line segment. We present a data structure for this problem with storage O((n2/εd)log(Δ/ε))O((n^2/\varepsilon^{d}) \log (\Delta/\varepsilon)) and query time O(log(max(n,Δ)/ε))O(\log (\max(n,\Delta)/\varepsilon)), where Δ\Delta is the spread of the set of segments SS. Our approach is based on a covering of space by anisotropic elements, which align themselves according to the orientations of nearby segments.Comment: 20 pages (including appendix), 5 figure
    corecore