Search CORE

365 research outputs found

Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

Author: Estrada-Galiñanes Vero
Felber Pascal
Miller Ethan
Pâris Jehan-François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/10/2018
Field of study

Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

arXiv.org e-Print Archive

Crossref

Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks

Author: Anderson Thomas E.
Katz Randy H.
Ousterhout John K.
Patterson David A.
Publication venue
Publication date
Field of study

Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications

NASA Technical Reports Server

Energy-aware data prefetching for multi-speed disks

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

Crossref

How migrating 0.0001% of address space saves 12% of energy in hybrid storage

Author: Hartel Pieter H.
Khatib Mohammed G.
Zwaag Berend Jan van der
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

We present a simple, operating-\ud system independent method to reduce the num-\ud ber of seek operations and consequently reduce\ud the energy consumption of a hybrid storage\ud device consisting of a hard disk and a ﬂash\ud memory. Trace-driven simulations show that\ud migrating a tiny amount of the address space\ud (0.0001%) from disk to ﬂash already results\ud in a signiﬁcant storage energy reduction (12%)\ud at virtually no extra cost. We show that the\ud amount of energy saving depends on which part\ud of the address space is migrated, and we present\ud two indicators for this, namely sequentiality and\ud request frequency. Our simulations show that\ud both are suitable as criterion for energy-saving\ud ﬁle placement methods in hybrid storage. We\ud address potential wear problems in the ﬂash\ud subsystem by presenting a simple way to pro-\ud long its expected lifetime.\u

University of Twente Research Information

Cold Storage Data Archives: More Than Just a Bunch of Tapes

Author: Appuswamy Raja
Memishi Bunjamin
Paradies Marcus
Publication venue
Publication date: 01/01/2019
Field of study

The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention. In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Recommended from our members

A File Allocation Strategy for Energy-Efficient Disk Storage Systems

Author: Otoo Ekow J
Otoo Ekow J.
Pinar Ali
Rotem Doron
Tsao Shi-Chiang
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 27/06/2008
Field of study

Exponential data growth is a reality for most enterprise and scientific data centers.Improvements in price/performance and storage densities of disks have made it both easy and affordable to maintain most of the data in large disk storage farms. The provisioning of disk storage farms however, is at the expense of high energy consumption due to the large number of spinning disks. The power for spinning the disks and the associated cooling costs is a significant fraction of the total power consumption of a typical data center. Given the trend of rising global fuel and energy prices and the high rate of data growth, the challenge is to implement appropriateconfigurations of large scale disk storage systems that meet performancerequirements for information retrieval across data centers. We present part of the solution to this challenge with an energy efficient file allocation strategy on a large scale disk storage system. Given performance characteristics of thedisks, and a profile of the workload in terms of frequencies of file requests and their sizes, the basic idea is to allocate files to disks such that the disks can be configured into two sets of active (constantly spinning), and passive (capable of being spun up or down) disk pools. The goal is to minimize the number of active disks subject to I/O performance constraints. We present an algorithm for solving this problem with guaranteed bounds from the optimal solution. Our algorithm runs in O(n) time where n is the number of files allocated. It uses a mapping of our file allocation problem to a generalization of the bin packing problem known as 2-dimensional vector packing. Detailed simulation results are also provided

UNT Digital Library

Priori information and sliding window based prediction algorithm for energy-efficient storage systems in cloud

Author: Chai Yunpeng
Chen Yinong
Du Zhihui
Fan Wenjun
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

One of the major challenges in cloud computing and data centers is the energy conservation and emission reduction. Accurate prediction algorithms are essential for building energy efficient storage systems in cloud computing. In this paper, we first propose a Three-State Disk Model (3SDM), which can describe the service quality and energy consumption states of a storage system accurately. Based on this model, we develop a method for achieving energy conservation without losing quality by skewing the workload among the disks to transmit the disk states of a storage system. The efficiency of this method is highly dependent on the accuracy of the information predicting the blocks to be accessed and the blocks not be accessed in the near future. We develop a priori information and sliding window based prediction (PISWP) algorithm by taking advantage of the priori information about human behavior and selecting suitable size of sliding window. The PISWP method targets at streaming media applications, but we also check its efficiency on other two applications, news in webpage and new tool released. Disksim, an established storage system simulator, is applied in our experiments to verify the effect of our method for various users’ traces. The results show that this prediction method can bring a high degree energy saving for storage systems in cloud computing environment

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

NSSDC Conference on Mass Storage Systems and Technologies for Space and Earth Science Applications, volume 1

Author: Blasso L. G.
Hariharan P. C.
Kobler Benjamin
Publication venue
Publication date
Field of study

Papers and viewgraphs from the conference are presented. This conference served as a broad forum for the discussion of a number of important issues in the field of mass storage systems. Topics include magnetic disk and tape technologies, optical disks and tape, software storage and file management systems, and experiences with the use of a large, distributed storage system. The technical presentations describe, among other things, integrated mass storage systems that are expected to be available commercially. Also included is a series of presentations from Federal Government organizations and research institutions covering their mass storage requirements for the 1990's

NASA Technical Reports Server

RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

Author: Thomasian Alexander
Publication venue
Publication date: 06/01/2024
Field of study

RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

arXiv.org e-Print Archive