177 research outputs found

    RAIDX: RAID EXTENDED FOR HETEROGENEOUS ARRAYS

    Get PDF
    The computer hard drive market has diversified with the establishment of solid state disks (SSDs) as an alternative to magnetic hard disks (HDDs). Each hard drive technology has its advantages: the SSDs are faster than HDDs but the HDDs are cheaper. Our goal is to construct a parallel storage system with HDDs and SSDs such that the parallel system is as fast as the SSDs. Achieving this goal is challenging since the slow HDDs store more data and become bottlenecks, while the SSDs remain idle. RAIDX is a parallel storage system designed for disks of different speeds, capacities and technologies. The RAIDX hardware consists of an array of disks; the RAIDX software consists of data structures and algorithms that allow the disks to be viewed as a single storage unit that has capacity equal to the sum of the capacities of its disks, failure rate lower than the failure rate of its individual disks, and speeds close to that of its faster disks. RAIDX achieves its performance goals with the aid of its novel parallel data organization technique that allows storage data to be moved on the fly without impacting the upper level file system. We show that storage data accesses satisfy the locality of reference principle, whereby only a small fraction of storage data are accessed frequently. RAIDX has a monitoring program that identifies frequently accessed blocks and a migration program that moves frequently accessed blocks to faster disks. The faster disks are caches that store the solo copy of frequently accessed data. Experimental evaluation has shown that a HDD+SSD RAIDX array is as fast as an all-SSD array when the workload shows locality of reference

    Stochastic Analysis on RAID Reliability for Solid-State Drives

    Full text link
    Solid-state drives (SSDs) have been widely deployed in desktops and data centers. However, SSDs suffer from bit errors, and the bit error rate is time dependent since it increases as an SSD wears down. Traditional storage systems mainly use parity-based RAID to provide reliability guarantees by striping redundancy across multiple devices, but the effectiveness of RAID in SSDs remains debatable as parity updates aggravate the wearing and bit error rates of SSDs. In particular, an open problem is that how different parity distributions over multiple devices, such as the even distribution suggested by conventional wisdom, or uneven distributions proposed in recent RAID schemes for SSDs, may influence the reliability of an SSD RAID array. To address this fundamental problem, we propose the first analytical model to quantify the reliability dynamics of an SSD RAID array. Specifically, we develop a "non-homogeneous" continuous time Markov chain model, and derive the transient reliability solution. We validate our model via trace-driven simulations and conduct numerical analysis to provide insights into the reliability dynamics of SSD RAID arrays under different parity distributions and subject to different bit error rates and array configurations. Designers can use our model to decide the appropriate parity distribution based on their reliability requirements.Comment: 12 page

    Off-line Deduplication Method for Solid-State Disk Based on Hot and Cold Data

    Get PDF
    Solid-state disk (SSD) deduplication refers to the identification and deletion of duplicate data stored in an SSD. The reliability of SSDs is improved by deduplication. At present, the common data deduplication of SSDs is based on online data deduplication with Field Programmable Gate Array (FPGA) acceleration. The disadvantage is that FPGA, which has a complex structure. An off-line deduplication method for the SSD based on hot and cold data was proposed in this study to simplify the structure of an SSD deduplication, reduce the cost, and improve the efficiency of deduplication and access performance of SSDs. First, the wear-leveling algorithm was employed in the SSD to divide the data into cold and hot. Then, the corresponding fingerprint was generated for the cold data. Second, the fingerprint was compared, and the cold data with the same fingerprint were deleted. Finally, the cold and hot data were exchanged after deduplication. Results demonstrate that the duplicate recognition rate of the proposed method is 5% - 38%, which is close to that of the online deduplication method. In terms of access performance, the performance of SSDs using the proposed method is improved by 20% compared with that of traditional SSDs and is near the access performance of SSDs using online deduplication. This study provides certain reference for improving the reliability of existing SSDs

    Data Management Strategies for Relative Quality of Service in Virtualised Storage Systems

    No full text
    The amount of data managed by organisations continues to grow relentlessly. Driven by the high costs of maintaining multiple local storage systems, there is a well established trend towards storage consolidation using multi-tier Virtualised Storage Systems (VSSs). At the same time, storage infrastructures are increasingly subject to stringent Quality of Service (QoS) demands. Within a VSS, it is challenging to match desired QoS with delivered QoS, considering the latter can vary dramatically both across and within tiers. Manual efforts to achieve this match require extensive and ongoing human intervention. Automated efforts are based on workload analysis, which ignores the business importance of infrequently accessed data. This thesis presents our design, implementation and evaluation of data maintenance strategies in an enhanced version of the popular Linux Extended 3 Filesystem which features support for the elegant specification of QoS metadata while maintaining compatibility with stock kernels. Users and applications specify QoS requirements using a chmod-like interface. System administrators are provided with a character device kernel interface that allows for profiling of the QoS delivered by the underlying storage. We propose a novel score-based metric, together with associated visualisation resources, to evaluate the degree of QoS matching achieved by any given data layout. We also design and implement new inode and datablock allocation and migration strategies which exploit this metric in seeking to match the QoS attributes set by users and/or applications on files and directories with the QoS actually delivered by each of the filesystem’s block groups. To create realistic test filesystems we have included QoS metadata support in the Impressions benchmarking framework. The effectiveness of the resulting data layout in terms of QoS matching is evaluated using a special kernel module that is capable of inspecting detailed filesystem data on-the-fly. We show that our implementations of the proposed inode and datablock allocation strategies are capable of dramatically improving data placement with respect to QoS requirements when compared to the default allocators

    A differentiated proposal of three dimension i/o performance characterization model focusing on storage environments

    Get PDF
    The I/O bottleneck remains a central issue in high-performance environments. Cloud computing, high-performance computing (HPC) and big data environments share many underneath difficulties to deliver data at a desirable time rate requested by high-performance applications. This increases the possibility of creating bottlenecks throughout the application feeding process by bottom hardware devices located in the storage system layer. In the last years, many researchers have been proposed solutions to improve the I/O architecture considering different approaches. Some of them take advantage of hardware devices while others focus on a sophisticated software approach. However, due to the complexity of dealing with high-performance environments, creating solutions to improve I/O performance in both software and hardware is challenging and gives researchers many opportunities. Classifying these improvements in different dimensions allows researchers to understand how these improvements have been built over the years and how it progresses. In addition, it also allows future efforts to be directed to research topics that have developed at a lower rate, balancing the general development process. This research present a three-dimension characterization model for classifying research works on I/O performance improvements for large scale storage computing facilities. This classification model can also be used as a guideline framework to summarize researches providing an overview of the actual scenario. We also used the proposed model to perform a systematic literature mapping that covered ten years of research on I/O performance improvements in storage environments. This study classified hundreds of distinct researches identifying which were the hardware, software, and storage systems that received more attention over the years, which were the most researches proposals elements and where these elements were evaluated. In order to justify the importance of this model and the development of solutions that targets I/O performance improvements, we evaluated a subset of these improvements using a a real and complete experimentation environment, the Grid5000. Analysis over different scenarios using a synthetic I/O benchmark demonstrates how the throughput and latency parameters behaves when performing different I/O operations using distinct storage technologies and approaches.O gargalo de E/S continua sendo um problema central em ambientes de alto desempenho. Os ambientes de computação em nuvem, computação de alto desempenho (HPC) e big data compartilham muitas dificuldades para fornecer dados em uma taxa de tempo desejável solicitada por aplicações de alto desempenho. Isso aumenta a possibilidade de criar gargalos em todo o processo de alimentação de aplicativos pelos dispositivos de hardware inferiores localizados na camada do sistema de armazenamento. Nos últimos anos, muitos pesquisadores propuseram soluções para melhorar a arquitetura de E/S considerando diferentes abordagens. Alguns deles aproveitam os dispositivos de hardware, enquanto outros se concentram em uma abordagem sofisticada de software. No entanto, devido à complexidade de lidar com ambientes de alto desempenho, criar soluções para melhorar o desempenho de E/S em software e hardware é um desafio e oferece aos pesquisadores muitas oportunidades. A classificação dessas melhorias em diferentes dimensões permite que os pesquisadores entendam como essas melhorias foram construídas ao longo dos anos e como elas progridem. Além disso, também permite que futuros esforços sejam direcionados para tópicos de pesquisa que se desenvolveram em menor proporção, equilibrando o processo geral de desenvolvimento. Esta pesquisa apresenta um modelo de caracterização tridimensional para classificar trabalhos de pesquisa sobre melhorias de desempenho de E/S para instalações de computação de armazenamento em larga escala. Esse modelo de classificação também pode ser usado como uma estrutura de diretrizes para resumir as pesquisas, fornecendo uma visão geral do cenário real. Também usamos o modelo proposto para realizar um mapeamento sistemático da literatura que abrangeu dez anos de pesquisa sobre melhorias no desempenho de E/S em ambientes de armazenamento. Este estudo classificou centenas de pesquisas distintas, identificando quais eram os dispositivos de hardware, software e sistemas de armazenamento que receberam mais atenção ao longo dos anos, quais foram os elementos de proposta mais pesquisados e onde esses elementos foram avaliados. Para justificar a importância desse modelo e o desenvolvimento de soluções que visam melhorias no desempenho de E/S, avaliamos um subconjunto dessas melhorias usando um ambiente de experimentação real e completo, o Grid5000. Análises em cenários diferentes usando um benchmark de E/S sintética demonstra como os parâmetros de vazão e latência se comportam ao executar diferentes operações de E/S usando tecnologias e abordagens distintas de armazenamento

    Towards Design and Analysis For High-Performance and Reliable SSDs

    Get PDF
    NAND Flash-based Solid State Disks have many attractive technical merits, such as low power consumption, light weight, shock resistance, sustainability of hotter operation regimes, and extraordinarily high performance for random read access, which makes SSDs immensely popular and be widely employed in different types of environments including portable devices, personal computers, large data centers, and distributed data systems. However, current SSDs still suffer from several critical inherent limitations, such as the inability of in-place-update, asymmetric read and write performance, slow garbage collection processes, limited endurance, and degraded write performance with the adoption of MLC and TLC techniques. To alleviate these limitations, we propose optimizations from both specific outside applications layer and SSDs\u27 internal layer. Since SSDs are good compromise between the performance and price, so SSDs are widely deployed as second layer caches sitting between DRAMs and hard disks to boost the system performance. Due to the special properties of SSDs such as the internal garbage collection processes and limited lifetime, traditional cache devices like DRAM and SRAM based optimizations might not work consistently for SSD-based cache. Therefore, for the outside applications layer, our work focus on integrating the special properties of SSDs into the optimizations of SSD caches. Moreover, our work also involves the alleviation of the increased Flash write latency and ECC complexity due to the adoption of MLC and TLC technologies by analyzing the real work workloads

    The Case for Medium-Sized Regional Data Centres

    Get PDF
    Cloud computing is widely associated with majorcapital investment in mega data centres, housing expensive bladeservers and storage area networks. In this paper we argue that amodular approach to building local or regional data centres usingcommodity hardware and open source hardware can produce acost effective solution that better addresses the goals of cloudcomputing, and provides a scalable architecture that meets theservice requirements of a high quality data centre.In support of this goal, we provide data that supports threeresearch hypotheses:1. that central processor unit (CPU) resources are notnormally limiting;2. that disk I/O transactions (TPS) are more oftenlimiting, but this can be mitigated by maximizing theTPS-CPU ratio;3. that customer CPU loads are generally static andsmall.Our results indicate that the modular, commodity hardwarebased architecture is near optimal. This is a very significantresult, as it opens the door to alternative business models for theprovision of data centres that significantly reduce the need formajor up-front capital investment

    Tallennusjärjestelmien energiatehokkuus klusterilaskennassa

    Get PDF
    Energy efficiency is an important part of the development of any technology. Cluster computing is no exception. As the energy prices rise, the costs of running a cluster can easily overcome the costs of buying one. A euro saved is a euro earned. This thesis examines and compares different hardware level approaches and software level configurations used in clusters to storage data. Solid state drives are not commonly used in clusters and one of the goals of this thesis is to study whether or not this relatively new technology is suitable to be used in clusters. The main goal is to understand what affects to the energy efficiency of a cluster from a data storage point of view. To reach these goals, the performance and energy consumption of a cluster, with different system configurations, is measured and analysed. These results can further be used to optimise existing clusters. The thesis is divided into two parts. In the literature study part, issues related to energy efficiency, data storage models, block devices, file systems and I/O schedulers are studied. In the experimental part, the test environment is introduced in detail and the results are reported and analysed. The tests are conducted using the CMS software with real LHC data to simulate heavy physics computing. During these tests, both hard disk and solid state drives are used with three different data storage schemes; a distributed approach with GlusterFS (a distributed file system) on compute nodes, a centralised approach with dedicated file server and a local approach with drives in the compute nodes of the cluster. The test results reveal that no significant gain is achieved by using solid state drives. Another key result is that a cluster can suffer from a major performance loss if the file system and I/O scheduler is not properly selected. The conclusion of this thesis is, that although there is no fundamental reason why solid state drives should not be used in clusters, considering the multifold price and low capacity compared to hard disk drives, it is not justifiable. As the development of solid state drives progress, a new study is in order. If the prices decline and storage capacity increases, solid state drives could abolish mechanical drives. /Kir11Energiatehokkuus on tärkeä osa-alue minkä tahansa teknologian kehityksessä, eikä klusterilaskenta tee tähän poikkeusta. Energian hinnan noustessa klusterin ylläpidon kustannukset ylittävät helposti sen hankkimiseen tarvittavat kustannukset. Jokainen säästetty euro on samanarvoinen kuin ansaittu euro. Tämä työ tarkastelee ja vertailee erilaisia laite- ja ohjelmistotason ratkaisuja, joita käytetään klusterilaskennassa datan tallentamiseen. SSD-levyjä ei yleisesti käytetä klustereissa ja yksi tämän työn päämääristä onkin selvittää soveltuuko tämä suhteellisen uusi tekniikka käytettäväksi klustereissa. Tärkein päämäärä on ymmärtää mitkä seikat vaikuttavat klusterin energiatehokkuuteen datan tallennuksen näkökulmasta. Näiden päämäärien saavuttamiseksi klusterin tehokkuutta ja energian kulutusta mitataan ja arvioidaan eri kokoonpanoilla. Tästä saatuja tuloksia voidaan käyttää energiatehokkuuden optimointiin muissa klustereissa. Työ on jaettu kahteen osaan. Taustatietoja tutkivassa kirjallisuusosassa paneudutaan asioihin, jotka liittyvät energiatehokkuuteen, datan tallennusmalleihin, levyihin, tiedostojärjestelmiin ja levyskedulereihin. Kokeellisessa osassa esitetään testiympäristö sekä raportoidaan ja analysoidaan työn tulokset. Testien suorittamisessa käytetään apuna CERNin CMS-ohjelmistoa ja LHC:n tuottamaa dataa mallintamaan raskasta fysiikkalaskentaa. Testeissä käytetään sekä SSD-levyjä että perinteisiä kiintolevyjä yhdessä kolmen erilaisen datan tallennusmallin kanssa. Tähän kuuluvat hajautettuun tiedostojärjestelmään, levypalvelimeen ja paikalliseen levyyn pohjautuvat ratkaisut. Tulokset paljastavat, että SSD-levyjen käytöllä ei saavuteta merkittävää etua. Toinen tärkeä tulos on, että huomattava osa klusterin kapasiteetista voi jäädä käyttämättä, mikäli tiedostojärjestelmä ja levyskeduleri eivät ole huolella valittuja. Työn johtopäätös on, että vaikka mitään estettä SSD-levyjen käytölle ei ole, kun otetaan huomioon sekä levyjen hinta että kapasiteetti, ei niiden käyttö ole perusteltua. Kun SSD-levyjen kehitys etenee, on syytä arvioida tilanne uudelleen. Mikäli hinnat laskevat ja tallennuskapasiteetti kasvaa, voi mekaaninen kiintolevy siirtyä historiaan

    Enterprise storage report for the 1990's

    Get PDF
    Data processing has become an increasingly vital function, if not the most vital function, in most businesses today. No longer only a mainframe domain, the data processing enterprise also includes the midrange and workstation platforms, either local or remote. This expanded view of the enterprise has encouraged more and more businesses to take a strategic, long-range view of information management rather than the short-term tactical approaches of the past. Some of the significant aspects of data storage in the enterprise for the 1990's are highlighted

    Performance Analysis of NAND Flash Memory Solid-State Disks

    Get PDF
    As their prices decline, their storage capacities increase, and their endurance improves, NAND Flash Solid-State Disks (SSD) provide an increasingly attractive alternative to Hard Disk Drives (HDD) for portable computing systems and PCs. HDDs have been an integral component of computing systems for several decades as long-term, non-volatile storage in memory hierarchy. Today's typical hard disk drive is a highly complex electro-mechanical system which is a result of decades of research, development, and fine-tuned engineering. Compared to HDD, flash memory provides a simpler interface, one without the complexities of mechanical parts. On the other hand, today's typical solid-state disk drive is still a complex storage system with its own peculiarities and system problems. Due to lack of publicly available SSD models, we have developed our NAND flash SSD models and integrated them into DiskSim, which is extensively used in academe in studying storage system architectures. With our flash memory simulator, we model various solid-state disk architectures for a typical portable computing environment, quantify their performance under real user PC workloads and explore potential for further improvements. We find the following: * The real limitation to NAND flash memory performance is not its low per-device bandwidth but its internal core interface. * NAND flash memory media transfer rates do not need to scale up to those of HDDs for good performance. * SSD organizations that exploit concurrency at both the system and device level improve performance significantly. * These system- and device-level concurrency mechanisms are, to a significant degree, orthogonal: that is, the performance increase due to one does not come at the expense of the other, as each exploits a different facet of concurrency exhibited within the PC workload. * SSD performance can be further improved by implementing flash-oriented queuing algorithms, access reordering, and bus ordering algorithms which exploit the flash memory interface and its timing differences between read and write requests
    corecore