Search CORE

6 research outputs found

Improved Bounds and Schemes for the Declustering Problem

Author: Doerr Benjamin
Hebbinghaus Nils
Werth Sören
Publication venue
Publication date: 01/01/2006
Field of study

The declustering problem is to allocate given data on parallel working storage devices in such a manner that typical requests find their data evenly distributed on the devices. Using deep results from discrepancy theory, we improve previous work of several authors concerning range queries to higher-dimensional data. We give a declustering scheme with an additive error of

O_d(\log^{d-1} M)

independent of the data size, where

d

is the dimension,

M

the number of storage devices and

d-1

does not exceed the smallest prime power in the canonical decomposition of

M

into prime powers. In particular, our schemes work for arbitrary

M

in dimensions two and three. For general

d

, they work for all

M\geq d-1

that are powers of two. Concerning lower bounds, we show that a recent proof of a

\Omega_d(\log^{\frac{d-1}{2}} M)

bound contains an error. We close the gap in the proof and thus establish the bound.Comment: 19 pages, 1 figur

arXiv.org e-Print Archive

Elsevier - Publisher Connector

MPG.PuRe

Asymptotically optimal declustering schemes for 2-dim range queries

Author: Sinha Rakesh K.
Bhatia Randeep
Chen Chung-Min
Publication venue: Elsevier Science B.V.
Publication date: 01/01/2003
Field of study

AbstractDeclustering techniques have been widely adopted in parallel storage systems (e.g. disk arrays) to speed up bulk retrieval of multidimensional data. A declustering scheme distributes data items among multiple disks, thus enabling parallel data access and reducing query response time. We measure the performance of any declustering scheme as its worst case additive deviation from the ideal scheme. The goal thus is to design declustering schemes with as small an additive error as possible. We describe a number of declustering schemes with additive error O(logM) for 2-dimensional range queries, where M is the number of disks. These are the first results giving O(logM) upper bound for all values of M. Our second result is a lower bound on the additive error. It is known that except for a few stringent cases, additive error of any 2-dimensional declustering scheme is at least one. We strengthen this lower bound to Ω((logM)(d−1/2)) for d-dimensional schemes and to Ω(logM) for 2-dimensional schemes, thus proving that the 2-dimensional schemes described in this paper are (asymptotically) optimal. These results are obtained by establishing a connection to geometric discrepancy. We also present simulation results to evaluate the performance of these schemes in practice

Elsevier - Publisher Connector

Crossref

Boston University Institutional Repository (OpenBU)

MultiMap: Preserving disk locality for multidimensional datasets

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

(Almost) Optimal Parallel Block Access for Range Queries

Author: Atallah Mikhail J.
Prabhakar Sunil
Publication venue: 'Purdue University (bepress)'
Publication date: 01/06/1999
Field of study

Range queries are an important class of queries for several applications including relational and spatial databases, visualization, and GIS applications. For large datasets, the performance of range queries is limited by disk I/O. Performance improvements are achieved by tiling the multi-dimensional data and distributing it among multiple disks or nodes. Consequently, in order to process a range query, it is necessary to access only those tiles or blocks that intersect with the query. Given k disks, a query that accesses m blocks needs a number of parallel block accesses that is at least dm�ke �which is known to be unachievable except for a few special cases �1��. Though several schemes for the allocation of tiles to disks have been developed � no scheme with guaranteed worst�case performance is known. We establish that any range query on a 2 q � 2 q �block grid of blocks can be performed using k � 2 t disks �t � q� � in at most dm�ke�O�log k � parallel block accesses. We also give two natural generalizations of the scheme to higher dimensions. We achieve this result by judiciously distributing the blocks among the k nodes or disks. Experimental data show that the algorithm achieves very close to dm�ke performance �on average less than 0.5 away from dm�ke � with a worst-case of 3�. Although several declustering schemes for range queries have been developed � prior to our work no additive non-trivial performance bounds were known. Our scheme guarantees performance within a �small � additive deviation from dm�ke. This guarantee is true for any number of dimensions. Subsequent to this work � Bhatia et al. �4 � have proved that such a performance bound is essentially optimal for this kind of scheme � and have also extended our results to the case where the number of disks is a product of the form k1 � k2 � � � � � kt where the kis need � Portions of this work were supported by sponsors of th

CiteSeerX

Purdue E-Pubs

Analysis and Comparison of Replicated Declustering Schemes

Author: Ali Saman Tosun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref