Search CORE

68 research outputs found

Scalability analysis of declustering methods for multidimensional range queries

Author: Bongki Moon
Joel H. Saltz
Publication venue
Publication date: 01/01/1998
Field of study

Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions

CiteSeerX

An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System

Author: Ganpaa Gayatri
Publication venue: ScholarWorks@UNO
Publication date: 20/01/2006
Field of study

The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query

University of New Orleans

A Survey on Array Storage, Query Languages, and Systems

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 19/02/2013
Field of study

Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

arXiv.org e-Print Archive

CiteSeerX

Titan A High-Performance Remote-Sensing Database

Author: Acharya Anurag
Chang Chialin
Moon Bongki
Saltz Joel
Shock Carter
Sussman Alan
Publication venue
Publication date: 01/01/1997
Field of study

There are two major challenges for a high-performance remote-sensing database. First, it must provide low-latency retrieval of very large volumes of spatio-temporal data. This requires effective declustering and placement of a multi-dimensional dataset onto a large disk farm. Second, the order of magnitude reduction in data-size due to post-processing makes it imperative, from a performance perspective, that the postprocessing be done on the machine that holds the data. This requires careful coordination of computation and data retrieval. This paper describes the design, implementation and evaluation of {\em Titan}, a parallel shared-nothing database designed for handling remote-sensing data. The computational platform for Titan is a 16-processor IBM SP-2 with four fast disks attached to each processor. Titan is currently operational and contains about 24~GB of data from the Advanced Very High Resolution Radiometer (AVHRR) on the NOAA-7 satellite. The experimental results show that Titan provides good performance for global queries, and interactive response times for local queries. (Also cross-referenced as UMIACS-TR-96-67

CiteSeerX

Digital Repository at the University of Maryland

A Logical Model and Data Placement Strategies for MEMS Storage Devices

Author: Kim Min-Soo
Kim Yi-Reun
Song Il-Yeol
Whang Kyu-Young
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 29/07/2008
Field of study

MEMS storage devices are new non-volatile secondary storages that have outstanding advantages over magnetic disks. MEMS storage devices, however, are much different from magnetic disks in the structure and access characteristics. They have thousands of heads called probe tips and provide the following two major access facilities: (1) flexibility: freely selecting a set of probe tips for accessing data, (2) parallelism: simultaneously reading and writing data with the set of probe tips selected. Due to these characteristics, it is nontrivial to find data placements that fully utilize the capability of MEMS storage devices. In this paper, we propose a simple logical model called the Region-Sector (RS) model that abstracts major characteristics affecting data retrieval performance, such as flexibility and parallelism, from the physical MEMS storage model. We also suggest heuristic data placement strategies based on the RS model and derive new data placements for relational data and two-dimensional spatial data by using those strategies. Experimental results show that the proposed data placements improve the data retrieval performance by up to 4.0 times for relational data and by up to 4.8 times for two-dimensional spatial data of approximately 320 Mbytes compared with those of existing data placements. Further, these improvements are expected to be more marked as the database size grows.Comment: 37 page

arXiv.org e-Print Archive

Crossref

Study of Scalable Declustering Algorithms for Parallel Grid Files

Author: Acharya Anurag
Moon Bongki
Saltz Joel
Publication venue
Publication date: 15/10/1998
Field of study

Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called {\em minimax} which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution. (Also cross-referenced as UMIACS-TR-96-4

Digital Repository at the University of Maryland

Scalability Analysis of Declustering Methods for Cartesian Product Files

Author: Moon Bongki
Saltz Joel
Publication venue
Publication date: 15/10/1998
Field of study

Efficient storage and retrieval of multi-attribute datasets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multi-attribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files over multiple disks to obtain high performance for disk accesses. Though the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper we derive formulas describing the scalability of two popular declustering methods Disk Modulo and Fieldwise Xor for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods and are corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure which can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions. (Also cross-referenced as UMIACS-TR-96-5

Digital Repository at the University of Maryland

Partitioning similarity graphs: A framework for declustering problems

Author: Barnes
Camp
Chang
Chen
Cheng
DeWitt
Du
Du
Duen-Ren Liu
Faloutsos
Faloutsos
Faloutsos
Fang
Fiduccia
Garey
Ghandeharizadeh
Ghandeharizadeh
Ghandeharizadeh
Himmatsingka
Jagadish
Kamel
Karypis
Kernighan
Kim
Kim
Kouramajian
Krishnamurthy
Li
Liu
Nievergelt
Rotem
Seeger
Shashi Shekhar
Shekhar
Weikum
Yeh
Zhou
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

High Performance Spatial Indexing for Parallel I/O and Centralized Architectures

Author: Kamel Ibrahim
Publication venue
Publication date: 15/10/1998
Field of study

Recently, spatial databases have attracted increasing interest in the database field. Because of the volume of the data with which they deal with, the performance of spatial database systems' is important. The R-tree is an efficient spatial access method. It is a generalization of the B-tree in multidimensional space. This thesis investigates how to improve the performance of R-trees. We consider both parallel I/O and centralized architectures. For a parallel I/O environment we propose an R-tree design for a server with one CPU and multiple disks. On this architecture, the nodes of the R-tree are distributed between the different disks with cross-disk pointers ( 'Multiplezed R-tree a). When a new node is created we have to decide on which disk it will be stored. We propose and examine several criteria for choosing a disk for a new node. The most successful one, termed 'Prozimity Indew' or PI, estimates the similarity of the new node to other R-tree nodes already on a disk and chooses the disk with the least degree of similarity. For a centralized environment, we propose a new packing technique for R-trees for static databases. We use space-filling curves, and specifically the Hilbert curve, to achieve better ordering of rectangles and eventually to achieve better packing. For dynamic databases we introduce the filbert R-tree, in which every node has a well defined set of sibling nodes; we can thus use the concept of local rotation [47]. By adjusting the split policy, the Filbert R-tree can achieve a degree of space utilization as high as is desired. (Also cross-referenced as UMIACS-TR-94-131

Digital Repository at the University of Maryland