Skip to main content
Article thumbnail
Location of Repository

An efficient parallel algorithm for high dimensional similarity join

By Khaled Alsabti, Sanjay Ranka and Vineet Singh

Abstract

Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 6-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions in-creases compared to previous data structures. We present a cost model of the E-k-d-B tree and use it to optimize the leaf size. We present novel parallel algorithms for the similar-ity join using the E-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Furthel; its cost is proportional to the overall cost of the similarity join. 1

Publisher: Society Press
Year: 1998
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.9274
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.ust.hk/~leichen/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.