6 research outputs found

    Improved Bounds and Schemes for the Declustering Problem

    Get PDF
    The declustering problem is to allocate given data on parallel working storage devices in such a manner that typical requests find their data evenly distributed on the devices. Using deep results from discrepancy theory, we improve previous work of several authors concerning range queries to higher-dimensional data. We give a declustering scheme with an additive error of Od(logd1M)O_d(\log^{d-1} M) independent of the data size, where dd is the dimension, MM the number of storage devices and d1d-1 does not exceed the smallest prime power in the canonical decomposition of MM into prime powers. In particular, our schemes work for arbitrary MM in dimensions two and three. For general dd, they work for all Md1M\geq d-1 that are powers of two. Concerning lower bounds, we show that a recent proof of a Ωd(logd12M)\Omega_d(\log^{\frac{d-1}{2}} M) bound contains an error. We close the gap in the proof and thus establish the bound.Comment: 19 pages, 1 figur

    Asymptotically optimal declustering schemes for 2-dim range queries

    Get PDF
    AbstractDeclustering techniques have been widely adopted in parallel storage systems (e.g. disk arrays) to speed up bulk retrieval of multidimensional data. A declustering scheme distributes data items among multiple disks, thus enabling parallel data access and reducing query response time. We measure the performance of any declustering scheme as its worst case additive deviation from the ideal scheme. The goal thus is to design declustering schemes with as small an additive error as possible. We describe a number of declustering schemes with additive error O(logM) for 2-dimensional range queries, where M is the number of disks. These are the first results giving O(logM) upper bound for all values of M. Our second result is a lower bound on the additive error. It is known that except for a few stringent cases, additive error of any 2-dimensional declustering scheme is at least one. We strengthen this lower bound to Ω((logM)(d−1/2)) for d-dimensional schemes and to Ω(logM) for 2-dimensional schemes, thus proving that the 2-dimensional schemes described in this paper are (asymptotically) optimal. These results are obtained by establishing a connection to geometric discrepancy. We also present simulation results to evaluate the performance of these schemes in practice

    MultiMap: Preserving disk locality for multidimensional datasets

    Full text link

    (Almost) Optimal Parallel Block Access for Range Queries

    Get PDF
    Range queries are an important class of queries for several applications including relational and spatial databases, visualization, and GIS applications. For large datasets, the performance of range queries is limited by disk I/O. Performance improvements are achieved by tiling the multi-dimensional data and distributing it among multiple disks or nodes. Consequently, in order to process a range query, it is necessary to access only those tiles or blocks that intersect with the query. Given k disks, a query that accesses m blocks needs a number of parallel block accesses that is at least dm�ke �which is known to be unachievable except for a few special cases �1��. Though several schemes for the allocation of tiles to disks have been developed � no scheme with guaranteed worst�case performance is known. We establish that any range query on a 2 q � 2 q �block grid of blocks can be performed using k � 2 t disks �t � q� � in at most dm�ke�O�log k � parallel block accesses. We also give two natural generalizations of the scheme to higher dimensions. We achieve this result by judiciously distributing the blocks among the k nodes or disks. Experimental data show that the algorithm achieves very close to dm�ke performance �on average less than 0.5 away from dm�ke � with a worst-case of 3�. Although several declustering schemes for range queries have been developed � prior to our work no additive non-trivial performance bounds were known. Our scheme guarantees performance within a �small � additive deviation from dm�ke. This guarantee is true for any number of dimensions. Subsequent to this work � Bhatia et al. �4 � have proved that such a performance bound is essentially optimal for this kind of scheme � and have also extended our results to the case where the number of disks is a product of the form k1 � k2 � � � � � kt where the kis need � Portions of this work were supported by sponsors of th

    Analysis and Comparison of Replicated Declustering Schemes

    Full text link
    corecore