Search CORE

3,432 research outputs found

Data Partitioning and Load Balancing in Parallel Disk Systems

Author: Gerhard Weikum
Peter Scheuermann
Peter Zabback
Publication venue
Publication date: 01/01/1994
Field of study

Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications, and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces. Keywords: parallel disk systems, performance tuning, file striping, data allocation, load balancing, disk cooling. 1 Introduction: Tuning Issues in Parallel Disk Systems Parallel disk systems are of great imp..

CiteSeerX

Repository for Publications and Research Data

Multi-Terabyte EIDE Disk Arrays running Linux RAID5

Author: Cremaldi L. M.
Eschenburg V.
Godang R.
Joy M. D.
Petravick D. L.
Sanders D. A.
Summers D. J.
Publication venue
Publication date: 19/11/2004
Field of study

High-energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible. Grid Computing is one method; however, the data must be cached at the various Grid nodes. We examine some storage techniques that exploit recent developments in commodity hardware. Disk arrays using RAID level 5 (RAID-5) include both parity and striping. The striping improves access speed. The parity protects data in the event of a single disk failure, but not in the case of multiple disk failures. We report on tests of dual-processor Linux Software RAID-5 arrays and Hardware RAID-5 arrays using a 12-disk 3ware controller, in conjunction with 250 and 300 GB disks, for use in offline high-energy physics data analysis. The price of IDE disks is now less than $1/GB. These RAID-5 disk arrays can be scaled to sizes affordable to small institutions and used when fast random access at low cost is important.Comment: Talk from the 2004 Computing in High Energy and Nuclear Physics (CHEP04), Interlaken, Switzerland, 27th September - 1st October 2004, 4 pages, LaTeX, uses CHEP2004.cls. ID 47, Poster Session 2, Track

arXiv.org e-Print Archive

CERN Document Server

Recommended from our members

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers

Author: Shenoy Prashant J.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/05/1997
Field of study

The performance of striped disk arrays is governed by two parameters: the stripe unit size and the degree of striping. In this paper, we describe techniques for determining the stripe unit size and degree of striping for disk arrays storing variable bit rate continuous media data. We present an analytical model that uses the server configuration and the workload characteristics to predict the load on the most heavily loaded disk in redundant and non-redundant arrays. We then use the model to determine the optimal stripe unit size for different workloads. We also use the model to study the effect of various system parameters on the optimal stripe unit size. To determine the degree of striping, we first demonstrate that striping a continuous media stream across all disks in the array causes the number of clients supported to increase sub-linearly with increase in the number of disks. To maximize the number of clients supported in large arrays, we propose a technique that partitions a disk array and stripes each media stream across a single partition. Since load imbalance can occur in such partitioned arrays, we present an analytical model to compute the imbalance across partitions in the array. We then use the model to determine a partition size that minimizes the load imbalance, and hence, maximizes the number of clients supported by the array

ScholarWorks@UMass Amherst

Efficient memory management in VOD disk array servers usingPer-Storage-Device buffering

Author: Conde Jesús F.
García-Martínez Alberto
Viña Ángel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

We present a buffering technique that reduces video-on-demand server memory requirements in more than one order of magnitude. This technique, Per-Storage-Device Buffering (PSDB), is based on the allocation of a fixed number of buffers per storage device, as opposed to existing solutions based on per-stream buffering allocation. The combination of this technique with disk array servers is studied in detail, as well as the influence of Variable Bit Streams. We also present an interleaved data placement strategy, Constant Time Length Declustering, that results in optimal performance in the service of VBR streams. PSDB is evaluated by extensive simulation of a disk array server model that incorporates a simulation based admission test.This research was supported in part by the National R&D Program of Spain, Project Number TIC97-0438.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

An Attempt to Detect the Galactic Bulge at 12 microns with IRAS

Author: Habing H. J.
Jay A. Frogel
van der Veen W. E. C. J.
Publication venue: 'University of Chicago Press'
Publication date: 21/04/1998
Field of study

Surface brightness maps at 12 microns, derived from observations with the Infrared Astronomical Satellite (IRAS), are used to estimate the integrated flux at this wavelength from the Galactic bulge as a function of galactic latitude along the minor axis. A simple model was used to remove Galactic disk emission (e.g. unresolved stars and dust) from the IRAS measurements. The resulting estimates are compared with predictions for the 12 micron bulge surface brightness based on observations of complete samples of optically identified M giants in several minor axis bulge fields. No evidence is found for any significant component of 12m emission in the bulge other than that expected from the optically identified M star sample plus normal, lower luminosity stars. Known large amplitude variables and point sources from the IRAS catalogue contribute only a small fraction to the total 12 micron flux.Comment: Accepted for publication in ApJ; 13 pages of text including tables in MS WORD97 generated postscript; 3 figures in postscript by Sigma Plo

arXiv.org e-Print Archive

Crossref

CERN Document Server

CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems

Author: Lee Patrick P. C.
Li Runhui
Lin Jian
Publication venue
Publication date: 01/01/2013
Field of study

Data availability is critical in distributed storage systems, especially when node failures are prevalent in real life. A key requirement is to minimize the amount of data transferred among nodes when recovering the lost or unavailable data of failed nodes. This paper explores recovery solutions based on regenerating codes, which are shown to provide fault-tolerant storage and minimum recovery bandwidth. Existing optimal regenerating codes are designed for single node failures. We build a system called CORE, which augments existing optimal regenerating codes to support a general number of failures including single and concurrent failures. We theoretically show that CORE achieves the minimum possible recovery bandwidth for most cases. We implement CORE and evaluate our prototype atop a Hadoop HDFS cluster testbed with up to 20 storage nodes. We demonstrate that our CORE prototype conforms to our theoretical findings and achieves recovery bandwidth saving when compared to the conventional recovery approach based on erasure codes.Comment: 25 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks

Author: Anderson Thomas E.
Katz Randy H.
Ousterhout John K.
Patterson David A.
Publication venue
Publication date
Field of study

Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications

NASA Technical Reports Server