5 research outputs found
Study of Scalable Declustering Algorithms for Parallel Grid Files
Efficient storage and retrieval of large multidimensional datasets is
an important concern for large-scale scientific computations such as
long-running time-dependent simulations which periodically generate
snapshots of the state.
The main challenge for efficiently handling such datasets
is to minimize response time for multidimensional range queries.
The grid file is one of the well known access methods for
multidimensional and spatial data.
We investigate effective and scalable declustering techniques
for grid files with the primary goal of minimizing response time
and the secondary goal of maximizing the fairness of data distribution.
The main contributions of this paper are (1) analytic and experimental
evaluation of existing index-based declustering techniques and their
extensions for grid files, and (2) development of a proximity-based
declustering algorithm called {\em minimax} which is experimentally
shown to scale and to consistently achieve better response time
compared to available algorithms while maintaining perfect disk distribution.
(Also cross-referenced as UMIACS-TR-96-4
Scalability analysis of declustering methods for multidimensional range queries
Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
Study of Scalable Declustering Algorithms for Parallel Grid Files
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called minimax which is experimentally shown to scale and to consistently achieve better response time compared to availabl..