Search CORE

14,379 research outputs found

Self-Repairing Disk Arrays

Author: Amer Ahmed
Long Darrell D. E.
Pâris Jehan-François
Schwarz Thomas J. E.
Publication venue
Publication date: 02/01/2015
Field of study

As the prices of magnetic storage continue to decrease, the cost of replacing failed disks becomes increasingly dominated by the cost of the service call itself. We propose to eliminate these calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime. To evaluate the feasibility of this approach, we have simulated the behavior of two-dimensional disk arrays with n parity disks and n(n-1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having n(n+1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years. We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347

arXiv.org e-Print Archive

CiteSeerX

EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures

Author: Jai Menon
Jehoshua Bruck
Jim Brady
Mario Blaum
Senior Member
Senior Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

We present a novel method, that we call EVENODD, for tolerating up to two disk failures in RAID architectures. EVENODD employs the addition of only two redundant disks and consists of simple exclusive-OR computations. This redundant storage is optimal, in the sense that two failed disks cannot be retrieved with less than two redundant disks. A major advantage of EVENODD is that it only requires parity hardware, which is typically present in standard RAID-5 controllers. Hence, EVENODD can be implemented on standard RAID-5 controllers without any hardware changes. The most commonly used scheme that employes optimal redundant storage (i.e., two extra disks) is based on Reed-Solomon (RS) error-correcting codes. This scheme requires computation over finite fields and results in a more complex implementation. For example, we show that the complexity of implementing EVENODD in a disk array with 15 disks is about 50% of the one required when using the RS scheme. The new scheme is not limited to RAID architectures: it can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording. To this end, we also present a decoding algorithm for one column (track) in error

CiteSeerX

Caltech Authors

Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks

Author: Anderson Thomas E.
Katz Randy H.
Ousterhout John K.
Patterson David A.
Publication venue
Publication date
Field of study

Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications

NASA Technical Reports Server

Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

Author: Estrada-Galiñanes Vero
Felber Pascal
Miller Ethan
Pâris Jehan-François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/10/2018
Field of study

Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

arXiv.org e-Print Archive

Crossref

Orbital assembly and maintenance study

Author: Gorman D.
Grant C.
Kyrias G.
Lord C.
Rombach J.
Salis M.
Skidmore R.
Thomas R.
Publication venue
Publication date
Field of study

The requirements, conceptual design, tradeoffs, procedures, and techniques for orbital assembly of the support structure of the microwave power transmission system and the radio astronomy telescope are described. Thermal and stress analyses, packaging, alignment, and subsystems requirements are included along with manned vs. automated and transportation tradeoffs. Technical and operational concepts for the manned and automated maintenance of satellites were investigated and further developed results are presented

NASA Technical Reports Server