Search CORE

91,947 research outputs found

Multidimensional Range Queries on Modern Hardware

Author: Leser Ulf
Schäfer Patrick
Sprenger Stefan
Publication venue
Publication date: 14/05/2018
Field of study

Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries

arXiv.org e-Print Archive

Crossref

Incidences between points and lines in three dimensions

Author: Sharir Micha
Solomon Noam
Publication venue
Publication date: 01/01/2015
Field of study

We give a fairly elementary and simple proof that shows that the number of incidences between

m

points and

n

lines in

{\mathbb R}^3

, so that no plane contains more than

s

lines, is

O\left(m^{1/2}n^{3/4}+ m^{2/3}n^{1/3}s^{1/3} + m + n\right)

(in the precise statement, the constant of proportionality of the first and third terms depends, in a rather weak manner, on the relation between

m

and

n

). This bound, originally obtained by Guth and Katz~\cite{GK2} as a major step in their solution of Erd{\H o}s's distinct distances problem, is also a major new result in incidence geometry, an area that has picked up considerable momentum in the past six years. Its original proof uses fairly involved machinery from algebraic and differential geometry, so it is highly desirable to simplify the proof, in the interest of better understanding the geometric structure of the problem, and providing new tools for tackling similar problems. This has recently been undertaken by Guth~\cite{Gu14}. The present paper presents a different and simpler derivation, with better bounds than those in \cite{Gu14}, and without the restrictive assumptions made there. Our result has a potential for applications to other incidence problems in higher dimensions

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

On Range Searching with Semialgebraic Sets II

Author: Agarwal Pankaj K.
Matousek Jiri
Sharir Micha
Publication venue
Publication date: 01/01/2012
Field of study

Let

P

be a set of

n

points in

\R^d

. We present a linear-size data structure for answering range queries on

P

with constant-complexity semialgebraic sets as ranges, in time close to

O(n^{1-1/d})

. It essentially matches the performance of similar structures for simplex range searching, and, for

d\ge 5

, significantly improves earlier solutions by the first two authors obtained in~1994. This almost settles a long-standing open problem in range searching. The data structure is based on the polynomial-partitioning technique of Guth and Katz [arXiv:1011.4105], which shows that for a parameter

r

1 < r \le n

, there exists a

d

-variate polynomial

f

of degree

O(r^{1/d})

such that each connected component of

\R^d\setminus Z(f)

contains at most

n/r

points of

P

, where

Z(f)

is the zero set of

f

. We present an efficient randomized algorithm for computing such a polynomial partition, which is of independent interest and is likely to have additional applications

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bandwidth selection for kernel estimation in mixed multi-dimensional spaces

Author: Aurélie Bugeau
Aurélie Bugeau
Patrick Pérez
Patrick Pérez
Projet Vista
Publication venue
Publication date: 01/01/2007
Field of study

Kernel estimation techniques, such as mean shift, suffer from one major drawback: the kernel bandwidth selection. The bandwidth can be fixed for all the data set or can vary at each points. Automatic bandwidth selection becomes a real challenge in case of multidimensional heterogeneous features. This paper presents a solution to this problem. It is an extension of \cite{Comaniciu03a} which was based on the fundamental property of normal distributions regarding the bias of the normalized density gradient. The selection is done iteratively for each type of features, by looking for the stability of local bandwidth estimates across a predefined range of bandwidths. A pseudo balloon mean shift filtering and partitioning are introduced. The validity of the method is demonstrated in the context of color image segmentation based on a 5-dimensional space

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1