91,947 research outputs found

    Multidimensional Range Queries on Modern Hardware

    Full text link
    Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries

    Incidences between points and lines in three dimensions

    Get PDF
    We give a fairly elementary and simple proof that shows that the number of incidences between mm points and nn lines in R3{\mathbb R}^3, so that no plane contains more than ss lines, is O(m1/2n3/4+m2/3n1/3s1/3+m+n) O\left(m^{1/2}n^{3/4}+ m^{2/3}n^{1/3}s^{1/3} + m + n\right) (in the precise statement, the constant of proportionality of the first and third terms depends, in a rather weak manner, on the relation between mm and nn). This bound, originally obtained by Guth and Katz~\cite{GK2} as a major step in their solution of Erd{\H o}s's distinct distances problem, is also a major new result in incidence geometry, an area that has picked up considerable momentum in the past six years. Its original proof uses fairly involved machinery from algebraic and differential geometry, so it is highly desirable to simplify the proof, in the interest of better understanding the geometric structure of the problem, and providing new tools for tackling similar problems. This has recently been undertaken by Guth~\cite{Gu14}. The present paper presents a different and simpler derivation, with better bounds than those in \cite{Gu14}, and without the restrictive assumptions made there. Our result has a potential for applications to other incidence problems in higher dimensions

    On Range Searching with Semialgebraic Sets II

    Full text link
    Let PP be a set of nn points in Rd\R^d. We present a linear-size data structure for answering range queries on PP with constant-complexity semialgebraic sets as ranges, in time close to O(n1−1/d)O(n^{1-1/d}). It essentially matches the performance of similar structures for simplex range searching, and, for d≥5d\ge 5, significantly improves earlier solutions by the first two authors obtained in~1994. This almost settles a long-standing open problem in range searching. The data structure is based on the polynomial-partitioning technique of Guth and Katz [arXiv:1011.4105], which shows that for a parameter rr, 1<r≤n1 < r \le n, there exists a dd-variate polynomial ff of degree O(r1/d)O(r^{1/d}) such that each connected component of Rd∖Z(f)\R^d\setminus Z(f) contains at most n/rn/r points of PP, where Z(f)Z(f) is the zero set of ff. We present an efficient randomized algorithm for computing such a polynomial partition, which is of independent interest and is likely to have additional applications

    Bandwidth selection for kernel estimation in mixed multi-dimensional spaces

    Get PDF
    Kernel estimation techniques, such as mean shift, suffer from one major drawback: the kernel bandwidth selection. The bandwidth can be fixed for all the data set or can vary at each points. Automatic bandwidth selection becomes a real challenge in case of multidimensional heterogeneous features. This paper presents a solution to this problem. It is an extension of \cite{Comaniciu03a} which was based on the fundamental property of normal distributions regarding the bias of the normalized density gradient. The selection is done iteratively for each type of features, by looking for the stability of local bandwidth estimates across a predefined range of bandwidths. A pseudo balloon mean shift filtering and partitioning are introduced. The validity of the method is demonstrated in the context of color image segmentation based on a 5-dimensional space
    • …
    corecore