1,053 research outputs found

    A limit process for partial match queries in random quadtrees and 22-d trees

    Full text link
    We consider the problem of recovering items matching a partially specified pattern in multidimensional trees (quadtrees and kk-d trees). We assume the traditional model where the data consist of independent and uniform points in the unit square. For this model, in a structure on nn points, it is known that the number of nodes Cn(ξ)C_n(\xi ) to visit in order to report the items matching a random query ξ\xi, independent and uniformly distributed on [0,1][0,1], satisfies E[Cn(ξ)]κnβ\mathbf {E}[{C_n(\xi )}]\sim\kappa n^{\beta}, where κ\kappa and β\beta are explicit constants. We develop an approach based on the analysis of the cost Cn(s)C_n(s) of any fixed query s[0,1]s\in[0,1], and give precise estimates for the variance and limit distribution of the cost Cn(x)C_n(x). Our results permit us to describe a limit process for the costs Cn(x)C_n(x) as xx varies in [0,1][0,1]; one of the consequences is that E[maxx[0,1]Cn(x)]γnβ\mathbf {E}[{\max_{x\in[0,1]}C_n(x)}]\sim \gamma n^{\beta}; this settles a question of Devroye [Pers. Comm., 2000].Comment: Published in at http://dx.doi.org/10.1214/12-AAP912 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1107.223

    Geometric transformations in octrees using shears

    Get PDF
    Existent algorithms to perform geometric transformations on octrees can be classified in two families: inverse transformation and address computation ones. Those in the inverse transformation family essentially resample the target octree from the source one, and are able to cope with all the affine transformations. Those in the address computation family only deal with translations, but are commonly accepted as faster than the former ones for they do no intersection tests, but directly calculate the transformed address of each black node in the source tree. This work introduces a new translation algorithm that shows to perform better than previous one when very small displacements are involved. This property is particularly useful in applications such as simulation, robotics or computer animation.Postprint (published version

    A limit field for orthogonal range searches in two-dimensional random point search trees

    Get PDF
    We consider the cost of general orthogonal range queries in random quadtrees. The cost of a given query is encoded into a (random) function of four variables which characterize the coordinates of two opposite corners of the query rectangle. We prove that, when suitably shifted and rescaled, the random cost function converges uniformly in probability towards a random field that is characterized as the unique solution to a distributional fixed-point equation. We also state similar results for 22-d trees. Our results imply for instance that the worst case query satisfies the same asymptotic estimates as a typical query, and thereby resolve an old question of Chanzy, Devroye and Zamora-Cura [\emph{Acta Inf.}, 37:355--383, 2000]Comment: 24 pages, 8 figure

    Triangulating the Square and Squaring the Triangle: Quadtrees and Delaunay Triangulations are Equivalent

    Full text link
    We show that Delaunay triangulations and compressed quadtrees are equivalent structures. More precisely, we give two algorithms: the first computes a compressed quadtree for a planar point set, given the Delaunay triangulation; the second finds the Delaunay triangulation, given a compressed quadtree. Both algorithms run in deterministic linear time on a pointer machine. Our work builds on and extends previous results by Krznaric and Levcopolous and Buchin and Mulzer. Our main tool for the second algorithm is the well-separated pair decomposition(WSPD), a structure that has been used previously to find Euclidean minimum spanning trees in higher dimensions (Eppstein). We show that knowing the WSPD (and a quadtree) suffices to compute a planar Euclidean minimum spanning tree (EMST) in linear time. With the EMST at hand, we can find the Delaunay triangulation in linear time. As a corollary, we obtain deterministic versions of many previous algorithms related to Delaunay triangulations, such as splitting planar Delaunay triangulations, preprocessing imprecise points for faster Delaunay computation, and transdichotomous Delaunay triangulations.Comment: 37 pages, 13 figures, full version of a paper that appeared in SODA 201

    Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space

    Full text link
    For a set of nn points in d\Re^d, and parameters kk and \eps, we present a data structure that answers (1+\eps,k)-\ANN queries in logarithmic time. Surprisingly, the space used by the data-structure is \Otilde (n /k); that is, the space used is sublinear in the input size if kk is sufficiently large. Our approach provides a novel way to summarize geometric data, such that meaningful proximity queries on the data can be carried out using this sketch. Using this, we provide a sublinear space data-structure that can estimate the density of a point set under various measures, including: \begin{inparaenum}[(i)] \item sum of distances of kk closest points to the query point, and \item sum of squared distances of kk closest points to the query point. \end{inparaenum} Our approach generalizes to other distance based estimation of densities of similar flavor. We also study the problem of approximating some of these quantities when using sampling. In particular, we show that a sample of size \Otilde (n /k) is sufficient, in some restricted cases, to estimate the above quantities. Remarkably, the sample size has only linear dependency on the dimension

    Multi-Source Spatial Entity Linkage

    Get PDF
    Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a time-efficient spatial blocking technique (QuadFlex), compares pairwise the spatial entities in the same block, ranks the pairs using Pareto optimality with the SkyRank algorithm, and finally, classifies the pairs with our novel SkyEx-* family of algorithms that yield 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the SkyEx-FES algorithm that explores only 27% of the skylines without any loss in F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates the optimal result with an F-measure loss of just 0.01. Finally, QuadSky provides the best trade-off between precision and recall, and the best F-measure compared to the existing baselines and clustering techniques, and approximates the results of supervised learning solutions