Search CORE

2,748 research outputs found

Scalability analysis of declustering methods for multidimensional range queries

Author: Bongki Moon
Joel H. Saltz
Publication venue
Publication date: 01/01/1998
Field of study

Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions

CiteSeerX

Scalability Analysis of Declustering Methods for Cartesian Product Files

Author: Moon Bongki
Saltz Joel
Publication venue
Publication date: 15/10/1998
Field of study

Efficient storage and retrieval of multi-attribute datasets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multi-attribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files over multiple disks to obtain high performance for disk accesses. Though the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper we derive formulas describing the scalability of two popular declustering methods Disk Modulo and Fieldwise Xor for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods and are corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure which can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions. (Also cross-referenced as UMIACS-TR-96-5

Digital Repository at the University of Maryland

A Virtual Data Grid for LIGO

Author: Allen Bruce
Deelman Ewa
Kesselman Carl
Lazzarini Albert
Prince Thomas A.
Romano Joe
Williams Roy
Publication venue
Publication date: 01/01/2001
Field of study

GriPhyN (Grid Physics Network) is a large US collaboration to build grid services for large physics experiments, one of which is LIGO, a gravitational-wave observatory. This paper explains the physics and computing challenges of LIGO, and the tools that GriPhyN will build to address them. A key component needed to implement the data pipeline is a virtual data service; a system to dynamically create data products requested during the various stages. The data could possibly be already processed in a certain way, it may be in a file on a storage system, it may be cached, or it may need to be created through computation. The full elaboration of this system will al-low complex data pipelines to be set up as virtual data objects, with existing data being transformed in diverse ways

Caltech Authors

MPG.PuRe

Systems Technology Laboratory (STL) compendium of utilities

Author: Decker W. J.
Green A. L.
Mcgarry F. E.
Merwarth P. D.
Pajerski R. S.
Smith E. J.
Stark M. E.
Taylor W. A.
Publication venue
Publication date
Field of study

Multipurpose programs, routines and operating systems are described. Data conversion and character string comparison subroutine are included. Graphics packages, and file maintenance programs are also included

NASA Technical Reports Server

Improved Bounds and Schemes for the Declustering Problem

Author: Doerr Benjamin
Hebbinghaus Nils
Werth Sören
Publication venue
Publication date: 01/01/2006
Field of study

The declustering problem is to allocate given data on parallel working storage devices in such a manner that typical requests find their data evenly distributed on the devices. Using deep results from discrepancy theory, we improve previous work of several authors concerning range queries to higher-dimensional data. We give a declustering scheme with an additive error of

O_d(\log^{d-1} M)

independent of the data size, where

d

is the dimension,

M

the number of storage devices and

d-1

does not exceed the smallest prime power in the canonical decomposition of

M

into prime powers. In particular, our schemes work for arbitrary

M

in dimensions two and three. For general

d

, they work for all

M\geq d-1

that are powers of two. Concerning lower bounds, we show that a recent proof of a

\Omega_d(\log^{\frac{d-1}{2}} M)

bound contains an error. We close the gap in the proof and thus establish the bound.Comment: 19 pages, 1 figur

arXiv.org e-Print Archive

Elsevier - Publisher Connector

MPG.PuRe

Data partitioning and load balancing in parallel disk systems

Author: Scheuermann Peter
Weikum Gerhard
Zabback Peter
Publication venue: Sonstige Einrichtungen. Sonstige Einrichtungen
Publication date: 01/01/1996
Field of study

Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent file system that optimizes striping by taking into account the requirements of the applications, and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces

NASA Technical Reports Server

Storage and Querying of Large Persistent Arrays

Author: C S Arun
Publication venue
Publication date: 01/01/2011
Field of study

The scientic and analytical applications today are increasingly becoming data in- tensive. Many such applications deal with data that is multidimensional in nature. Traditionally, relational database systems have been used by many data intensive application, and relational paradigm has proved to be both natural and ecient. However, for multidimensional data, when the number of dimensions becomes large, relational databases are inecient both in terms of storage and query response time. In this thesis, we explore linearised storage, and indexed and skiplist based retrieval on persistent arrays. The application programs are provided with a logical view of multidimensional array. The techniques have been implemented in a home-grown database management system called MuBase

Research Archive of Indian Institute of Technology Hyderabad

Benchmarking BigSQL Systems

Author: Aluko Victor Olugbenga
Publication venue
Publication date: 01/01/2018
Field of study

Elame suurandmete ajastul. Tänapäeval on olemas suurandmete töötlemise süsteemid, mis on võimelised haldama sadu terabaite ja petabaite andmeid. Need süsteemid töötlevad andmehulki, mis on liiga suured traditsiooniliste andmebaasisüsteemide jaoks. Mõned neist süsteemidest sisaldavad SQL keeli andmehoidlaga suhtlemiseks. Nendel süsteemidel, mida nimetatakse ka BigSQL süsteemideks, on mõned omadused, mis teevad nende andmete hoidmist ja haldamist unikaalseks. Süsteemide paremaks mõistmiseks on vajalik nende jõudluse ja omaduste uuring. Antud töö sisaldab BigSQL süsteemide jõudluse võrdlusuuringut. Valitud BigSQL süsteemdiega viiakse läbi standardiseeritud jõudlustestid ja eksperimentidest saadud tulemusi analüüsitakse. Töö eesmärgiks on seletada paremini lahti valitud BigSQL süsteemide omadusi ja käitumist.We live in the era of BigData. We now have BigData systems which are able to manage data in volumes of hundreds of terabytes and petabytes. These BigData systems handle data sizes which are too large for traditional database systems to handle. Some of these BigData systems now provide SQL syntax for interacting with their store. These BigData systems, referred to as BigSQL systems, possess certain features which make them unique in how they manage the stored. A study into the performances and characteristics of these BigSQL systems is necessary in order to better understand these systems. This thesis provides that study into the performance of these BigSQL systems. In this thesis, we perform standardized benchmark experiments against some selected BigSQL systems and then analyze the performances of these systems based on the results of the experiments. The output of this thesis study will provide an understanding of the features and behaviors of the BigSQL systems

DSpace at Tartu University Library