3,432 research outputs found
Data Partitioning and Load Balancing in Parallel Disk Systems
Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible ways, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications, and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces. Keywords: parallel disk systems, performance tuning, file striping, data allocation, load balancing, disk cooling. 1 Introduction: Tuning Issues in Parallel Disk Systems Parallel disk systems are of great imp..
Multi-Terabyte EIDE Disk Arrays running Linux RAID5
High-energy physics experiments are currently recording large amounts of data
and in a few years will be recording prodigious quantities of data. New methods
must be developed to handle this data and make analysis at universities
possible. Grid Computing is one method; however, the data must be cached at the
various Grid nodes. We examine some storage techniques that exploit recent
developments in commodity hardware. Disk arrays using RAID level 5 (RAID-5)
include both parity and striping. The striping improves access speed. The
parity protects data in the event of a single disk failure, but not in the case
of multiple disk failures.
We report on tests of dual-processor Linux Software RAID-5 arrays and
Hardware RAID-5 arrays using a 12-disk 3ware controller, in conjunction with
250 and 300 GB disks, for use in offline high-energy physics data analysis. The
price of IDE disks is now less than $1/GB. These RAID-5 disk arrays can be
scaled to sizes affordable to small institutions and used when fast random
access at low cost is important.Comment: Talk from the 2004 Computing in High Energy and Nuclear Physics
(CHEP04), Interlaken, Switzerland, 27th September - 1st October 2004, 4
pages, LaTeX, uses CHEP2004.cls. ID 47, Poster Session 2, Track
Recommended from our members
Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers
The performance of striped disk arrays is governed by two parameters: the stripe unit size and the degree of striping. In this paper, we describe techniques for determining the stripe unit size and degree of striping for disk arrays storing variable bit rate continuous media data. We present an analytical model that uses the server configuration and the workload characteristics to predict the load on the most heavily loaded disk in redundant and non-redundant arrays. We then use the model to determine the optimal stripe unit size for different workloads. We also use the model to study the effect of various system parameters on the optimal stripe unit size. To determine the degree of striping, we first demonstrate that striping a continuous media stream across all disks in the array causes the number of clients supported to increase sub-linearly with increase in the number of disks. To maximize the number of clients supported in large arrays, we propose a technique that partitions a disk array and stripes each media stream across a single partition. Since load imbalance can occur in such partitioned arrays, we present an analytical model to compute the imbalance across partitions in the array. We then use the model to determine a partition size that minimizes the load imbalance, and hence, maximizes the number of clients supported by the array
Efficient memory management in VOD disk array servers usingPer-Storage-Device buffering
We present a buffering technique that reduces video-on-demand server memory requirements in more than one order of magnitude. This technique, Per-Storage-Device Buffering (PSDB), is based on the allocation of a fixed number of buffers per storage device, as opposed to existing solutions based on per-stream buffering allocation. The combination of this technique with disk array servers is studied in detail, as well as the influence of Variable Bit Streams. We also present an interleaved data placement strategy, Constant Time Length Declustering, that results in optimal performance in the service of VBR streams. PSDB is evaluated by extensive simulation of a disk array server model that incorporates a simulation based admission test.This research was supported in part by the National R&D Program of Spain, Project Number TIC97-0438.Publicad
An Attempt to Detect the Galactic Bulge at 12 microns with IRAS
Surface brightness maps at 12 microns, derived from observations with the
Infrared Astronomical Satellite (IRAS), are used to estimate the integrated
flux at this wavelength from the Galactic bulge as a function of galactic
latitude along the minor axis. A simple model was used to remove Galactic disk
emission (e.g. unresolved stars and dust) from the IRAS measurements. The
resulting estimates are compared with predictions for the 12 micron bulge
surface brightness based on observations of complete samples of optically
identified M giants in several minor axis bulge fields. No evidence is found
for any significant component of 12m emission in the bulge other than that
expected from the optically identified M star sample plus normal, lower
luminosity stars. Known large amplitude variables and point sources from the
IRAS catalogue contribute only a small fraction to the total 12 micron flux.Comment: Accepted for publication in ApJ; 13 pages of text including tables in
MS WORD97 generated postscript; 3 figures in postscript by Sigma Plo
CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems
Data availability is critical in distributed storage systems, especially when
node failures are prevalent in real life. A key requirement is to minimize the
amount of data transferred among nodes when recovering the lost or unavailable
data of failed nodes. This paper explores recovery solutions based on
regenerating codes, which are shown to provide fault-tolerant storage and
minimum recovery bandwidth. Existing optimal regenerating codes are designed
for single node failures. We build a system called CORE, which augments
existing optimal regenerating codes to support a general number of failures
including single and concurrent failures. We theoretically show that CORE
achieves the minimum possible recovery bandwidth for most cases. We implement
CORE and evaluate our prototype atop a Hadoop HDFS cluster testbed with up to
20 storage nodes. We demonstrate that our CORE prototype conforms to our
theoretical findings and achieves recovery bandwidth saving when compared to
the conventional recovery approach based on erasure codes.Comment: 25 page
Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks
Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications
- …