49,301 research outputs found

    Searching for the Closest-Pair in a Query Translate

    Get PDF
    We consider a range-search variant of the closest-pair problem. Let Gamma be a fixed shape in the plane. We are interested in storing a given set of n points in the plane in some data structure such that for any specified translate of Gamma, the closest pair of points contained in the translate can be reported efficiently. We present results on this problem for two important settings: when Gamma is a polygon (possibly with holes) and when Gamma is a general convex body whose boundary is smooth. When Gamma is a polygon, we present a data structure using O(n) space and O(log n) query time, which is asymptotically optimal. When Gamma is a general convex body with a smooth boundary, we give a near-optimal data structure using O(n log n) space and O(log^2 n) query time. Our results settle some open questions posed by Xue et al. at SoCG 2018

    New Bounds for Range Closest-Pair Problems

    Get PDF
    Given a dataset S of points in R^2, the range closest-pair (RCP) problem aims to preprocess S into a data structure such that when a query range X is specified, the closest-pair in S cap X can be reported efficiently. The RCP problem can be viewed as a range-search version of the classical closest-pair problem, and finds applications in many areas. Due to its non-decomposability, the RCP problem is much more challenging than many traditional range-search problems. This paper revisits the RCP problem, and proposes new data structures for various query types including quadrants, strips, rectangles, and halfplanes. Both worst-case and average-case analyses (in the sense that the data points are drawn uniformly and independently from the unit square) are applied to these new data structures, which result in new bounds for the RCP problem. Some of the new bounds significantly improve the previous results, while the others are entirely new

    Hardness of Approximate Nearest Neighbor Search

    Full text link
    We prove conditional near-quadratic running time lower bounds for approximate Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance. Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false, for every δ>0\delta>0 there exists a constant ϵ>0\epsilon>0 such that computing a (1+ϵ)(1+\epsilon)-approximation to the Bichromatic Closest Pair requires n2δn^{2-\delta} time. In particular, this implies a near-linear query time for Approximate Nearest Neighbor search with polynomial preprocessing time. Our reduction uses the Distributed PCP framework of [ARW'17], but obtains improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but our construction is the first to yield new hardness results

    A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries

    Get PDF
    Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the ε Distance Join Query (εDJQ). The KCPQ finds the K closest pairs of points from two datasets and the εDJQ finds all the possible pairs of points from two datasets, that are within a distance threshold ε of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time

    On the Quantum Complexity of Closest Pair and Related Problems

    Get PDF
    The closest pair problem is a fundamental problem of computational geometry: given a set of nn points in a dd-dimensional space, find a pair with the smallest distance. A classical algorithm taught in introductory courses solves this problem in O(nlogn)O(n\log n) time in constant dimensions (i.e., when d=O(1)d=O(1)). This paper asks and answers the question of the problem's quantum time complexity. Specifically, we give an O~(n2/3)\tilde{O}(n^{2/3}) algorithm in constant dimensions, which is optimal up to a polylogarithmic factor by the lower bound on the quantum query complexity of element distinctness. The key to our algorithm is an efficient history-independent data structure that supports quantum interference. In polylog(n)\mathrm{polylog}(n) dimensions, no known quantum algorithms perform better than brute force search, with a quadratic speedup provided by Grover's algorithm. To give evidence that the quadratic speedup is nearly optimal, we initiate the study of quantum fine-grained complexity and introduce the Quantum Strong Exponential Time Hypothesis (QSETH), which is based on the assumption that Grover's algorithm is optimal for CNF-SAT when the clause width is large. We show that the na\"{i}ve Grover approach to closest pair in higher dimensions is optimal up to an no(1)n^{o(1)} factor unless QSETH is false. We also study the bichromatic closest pair problem and the orthogonal vectors problem, with broadly similar results.Comment: 46 pages, 3 figures, presentation improve

    Efficient geometric algorithms for preference top-k queries, stochastic line arrangements, and proximity problems

    Get PDF
    University of Minnesota Ph.D. dissertation. June 2017. Major: Computer Science. Advisor: Ravi Janardan. 1 computer file (PDF); x, 150 pages.Problems arising in diverse real-world applications can often be modeled by geometric objects such as points, lines, and polygons. The goal of this dissertation research is to design efficient algorithms for such geometric problems and provide guarantees on their performance via rigorous theoretical analysis. Three related problems are discussed in this thesis. The first problem revisits the well-known problem of answering preference top-k queries, which arise in a wide range of applications in databases and computational geometry. Given a set of n points, each with d real-valued attributes, the goal is to organize the points into a suitable data structure so that user preference queries can be answered efficiently. A query consists of a d-dimensional vector w, representing a user's preference for each attribute, and an integer k, representing the number of data points to be retrieved. The answer to a query is the k highest-scoring points relative to w, where the score of a point, p, is designed to reflect how well it captures, in aggregate, the user's preferences for the different attributes. This thesis contributes efficient exact solutions in low dimensions (2D and 3D), and a new sampling-based approximation algorithm in higher dimensions. The second problem extends the fundamental geometric concept of a line arrangement to stochastic data. A line arrangement in the plane is a partition of the plane into vertices, edges, and faces. Surprisingly, diverse problems, including the preference top-k query and k-order Voronoi Diagram, essentially boil down to answering questions about the set of k-topmost lines at some abscissa. This thesis considers line arrangements in a new setting, where each line has an associated existence probability representing uncertainty that is inherent in real-world data. An upper-bound is derived on the expected number of changes in the set of k-topmost lines, taken over the entire x-axis, and a worst-case upper bound is given for k = 1. Also, given is an efficient algorithm to compute the most likely k-topmost lines in the arrangement. Applications of this problem including the most likely Voronoi Diagram in R^1 and stochastic preference top-k query are discussed. The third problem discussed is geometric proximity search in both the stochastic setting and the query-retrieval setting. Under the stochastic setting, the thesis considers two fundamental problems, namely, the stochastic closest pair problem and the k most likely nearest neighbor search. In both problems, the data points are assumed to lie on a tree embedded in R^2 and distances are measured along the tree (a so-called tree space). For the former, efficient solutions are given to compute the probability that the closest pair distance of a realization of the input is at least l and to compute the expected closest pair distance. For the latter, the thesis generalizes the concept of most likely Voronoi Diagram from R^1 to tree space and bounds its combinatorial complexity. A data structure for the diagram and an algorithm to construct it are also given. For the query-retrieval version which is considered in R^2, the goal is to retrieve the closest pair within a user-specified query range. The contributions here include efficient data structures and algorithms that have fast query time while using linear or near-linear space for a variety of query shapes. In addition, a generic framework is presented, which returns a closest pair that is no farther apart than the closest pair in a suitably shrunken version of the query range
    corecore