1,053 research outputs found
A limit process for partial match queries in random quadtrees and -d trees
We consider the problem of recovering items matching a partially specified
pattern in multidimensional trees (quadtrees and -d trees). We assume the
traditional model where the data consist of independent and uniform points in
the unit square. For this model, in a structure on points, it is known that
the number of nodes to visit in order to report the items matching
a random query , independent and uniformly distributed on ,
satisfies , where and
are explicit constants. We develop an approach based on the analysis of
the cost of any fixed query , and give precise estimates
for the variance and limit distribution of the cost . Our results
permit us to describe a limit process for the costs as varies in
; one of the consequences is that ; this settles a question of
Devroye [Pers. Comm., 2000].Comment: Published in at http://dx.doi.org/10.1214/12-AAP912 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org). arXiv admin note: text
overlap with arXiv:1107.223
Geometric transformations in octrees using shears
Existent algorithms to perform geometric transformations on octrees
can be classified in two families: inverse transformation and address
computation ones. Those in the inverse transformation family
essentially resample the target octree from the source one, and are
able to cope with all the affine transformations. Those in the address
computation family only deal with translations, but are commonly
accepted as faster than the former ones for they do no intersection
tests, but directly calculate the transformed address of each black
node in the source tree. This work introduces a new translation
algorithm that shows to perform better than previous one when very
small displacements are involved. This property is particularly useful
in applications such as simulation, robotics or computer animation.Postprint (published version
A limit field for orthogonal range searches in two-dimensional random point search trees
We consider the cost of general orthogonal range queries in random quadtrees.
The cost of a given query is encoded into a (random) function of four variables
which characterize the coordinates of two opposite corners of the query
rectangle. We prove that, when suitably shifted and rescaled, the random cost
function converges uniformly in probability towards a random field that is
characterized as the unique solution to a distributional fixed-point equation.
We also state similar results for -d trees. Our results imply for instance
that the worst case query satisfies the same asymptotic estimates as a typical
query, and thereby resolve an old question of Chanzy, Devroye and Zamora-Cura
[\emph{Acta Inf.}, 37:355--383, 2000]Comment: 24 pages, 8 figure
Triangulating the Square and Squaring the Triangle: Quadtrees and Delaunay Triangulations are Equivalent
We show that Delaunay triangulations and compressed quadtrees are equivalent
structures. More precisely, we give two algorithms: the first computes a
compressed quadtree for a planar point set, given the Delaunay triangulation;
the second finds the Delaunay triangulation, given a compressed quadtree. Both
algorithms run in deterministic linear time on a pointer machine. Our work
builds on and extends previous results by Krznaric and Levcopolous and Buchin
and Mulzer. Our main tool for the second algorithm is the well-separated pair
decomposition(WSPD), a structure that has been used previously to find
Euclidean minimum spanning trees in higher dimensions (Eppstein). We show that
knowing the WSPD (and a quadtree) suffices to compute a planar Euclidean
minimum spanning tree (EMST) in linear time. With the EMST at hand, we can find
the Delaunay triangulation in linear time.
As a corollary, we obtain deterministic versions of many previous algorithms
related to Delaunay triangulations, such as splitting planar Delaunay
triangulations, preprocessing imprecise points for faster Delaunay computation,
and transdichotomous Delaunay triangulations.Comment: 37 pages, 13 figures, full version of a paper that appeared in SODA
201
Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space
For a set of points in , and parameters and \eps, we present
a data structure that answers (1+\eps,k)-\ANN queries in logarithmic time.
Surprisingly, the space used by the data-structure is \Otilde (n /k); that
is, the space used is sublinear in the input size if is sufficiently large.
Our approach provides a novel way to summarize geometric data, such that
meaningful proximity queries on the data can be carried out using this sketch.
Using this, we provide a sublinear space data-structure that can estimate the
density of a point set under various measures, including:
\begin{inparaenum}[(i)]
\item sum of distances of closest points to the query point, and
\item sum of squared distances of closest points to the query point.
\end{inparaenum}
Our approach generalizes to other distance based estimation of densities of
similar flavor. We also study the problem of approximating some of these
quantities when using sampling. In particular, we show that a sample of size
\Otilde (n /k) is sufficient, in some restricted cases, to estimate the above
quantities. Remarkably, the sample size has only linear dependency on the
dimension
Multi-Source Spatial Entity Linkage
Besides the traditional cartographic data sources, spatial information can
also be derived from location-based sources. However, even though different
location-based sources refer to the same physical world, each one has only
partial coverage of the spatial entities, describe them with different
attributes, and sometimes provide contradicting information. Hence, we
introduce the spatial entity linkage problem, which finds which pairs of
spatial entities belong to the same physical spatial entity. Our proposed
solution (QuadSky) starts with a time-efficient spatial blocking technique
(QuadFlex), compares pairwise the spatial entities in the same block, ranks the
pairs using Pareto optimality with the SkyRank algorithm, and finally,
classifies the pairs with our novel SkyEx-* family of algorithms that yield
0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs
and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of
777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the
SkyEx-FES algorithm that explores only 27% of the skylines without any loss in
F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates
the optimal result with an F-measure loss of just 0.01. Finally, QuadSky
provides the best trade-off between precision and recall, and the best
F-measure compared to the existing baselines and clustering techniques, and
approximates the results of supervised learning solutions
- …