6 research outputs found
Approximate Nearest Neighbor Search Amid Higher-Dimensional Flats
We consider the Approximate Nearest Neighbor (ANN) problem where the input set consists of n k-flats in the Euclidean Rd, for any fixed parameters k 0 is another prespecified parameter. We present an algorithm that achieves this task with n^{k+1}(log(n)/epsilon)^O(1) storage and preprocessing (where the constant of proportionality in the big-O notation depends on d), and can answer a query in O(polylog(n)) time (where the power of the logarithm depends on d and k). In particular, we need only near-quadratic storage to answer ANN queries amidst a set of n lines in any fixed-dimensional Euclidean space. As a by-product, our approach also yields an algorithm, with similar performance bounds, for answering exact nearest neighbor queries amidst k-flats with respect to any polyhedral distance function. Our results are more general, in that they also
provide a tradeoff between storage and query time
Approximate Sparse Linear Regression
In the Sparse Linear Regression (SLR) problem, given a d x n matrix M and a d-dimensional query q, the goal is to compute a k-sparse n-dimensional vector tau such that the error ||M tau - q|| is minimized. This problem is equivalent to the following geometric problem: given a set P of n points and a query point q in d dimensions, find the closest k-dimensional subspace to q, that is spanned by a subset of k points in P. In this paper, we present data-structures/algorithms and conditional lower bounds for several variants of this problem (such as finding the closest induced k dimensional flat/simplex instead of a subspace).
In particular, we present approximation algorithms for the online variants of the above problems with query time O~(n^{k-1}), which are of interest in the "low sparsity regime" where k is small, e.g., 2 or 3. For k=d, this matches, up to polylogarithmic factors, the lower bound that relies on the affinely degenerate conjecture (i.e., deciding if n points in R^d contains d+1 points contained in a hyperplane takes Omega(n^d) time). Moreover, our algorithms involve formulating and solving several geometric subproblems, which we believe to be of independent interest
Sparse Regression via Range Counting
The sparse regression problem, also known as best subset selection problem, can be cast as follows: Given a set S of n points in ?^d, a point y? ?^d, and an integer 2 ? k ? d, find an affine combination of at most k points of S that is nearest to y. We describe a O(n^{k-1} log^{d-k+2} n)-time randomized (1+?)-approximation algorithm for this problem with d and ? constant. This is the first algorithm for this problem running in time o(n^k). Its running time is similar to the query time of a data structure recently proposed by Har-Peled, Indyk, and Mahabadi (ICALP\u2718), while not requiring any preprocessing. Up to polylogarithmic factors, it matches a conditional lower bound relying on a conjecture about affine degeneracy testing. In the special case where k = d = O(1), we provide a simple O_?(n^{d-1+?})-time deterministic exact algorithm, for any ? > 0. Finally, we show how to adapt the approximation algorithm for the sparse linear regression and sparse convex regression problems with the same running time, up to polylogarithmic factors
Approximate Nearest-Neighbor Search for Line Segments
Approximate nearest-neighbor search is a fundamental algorithmic problem that
continues to inspire study due its essential role in numerous contexts. In
contrast to most prior work, which has focused on point sets, we consider
nearest-neighbor queries against a set of line segments in , for
constant dimension . Given a set of disjoint line segments in
and an error parameter , the objective is to
build a data structure such that for any query point , it is possible to
return a line segment whose Euclidean distance from is at most
times the distance from to its nearest line segment. We
present a data structure for this problem with storage and query time , where is the spread of the set of
segments . Our approach is based on a covering of space by anisotropic
elements, which align themselves according to the orientations of nearby
segments.Comment: 20 pages (including appendix), 5 figure