Search CORE

28 research outputs found

Elevating commodity storage with the SALSA host translation layer

Author: Ioannou Nikolas
Koltsidas Ioannis
Kourtis Kornilios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/01/2019
Field of study

To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS

arXiv.org e-Print Archive

Crossref

Shingled Magnetic Recording disks for Mass Storage Systems

Author: Le Quoc Minh
Publication venue: Scholar Commons
Publication date: 22/02/2019
Field of study

Disk drives have seen a dramatic increase in storage density over the last five decades, but to continue the growth seems difficult if not impossible because of physical limitations. One way to increase storage density is using a shingled magnetic recording (SMR) disk. Shingled writing is a promising technique that trades off the inability to update in-place for narrower tracks and thus a much higher data density. It is particularly appealing as it can be adopted while utilizing essentially the same physical recording mechanisms currently in use. Because of its manner of writing, an SMR disk would be unable to update a written track without overwriting neighboring tracks, potentially requiring the rewrite of all the tracks to the end of a band where the end of a band is an area left unwritten to allow for a non-overlapped final track. Random reads are still possible on such devices, but the handling of writes becomes particularly critical. In this manuscript, we first look at a variety of potential workloads, drawn from real-world traces, and evaluate their impact on SMR disk models. Later, we evaluate the behavior of SMR disks when used in an array configuration or when faced with heavily interleaved workloads. Specifically, we demonstrate the dramatically different effects that different workloads can have upon the opposing approaches of remapping and restoring blocks, and how write-heavy workloads can (under the right conditions, and contrary to intuition) result in a performance advantage for an SMR disk

Scholar Commons - Santa Clara University

The Applications of Workload Characterization in The World of Massive Data Storage

Author: He Weiping
Publication venue
Publication date: 01/08/2015
Field of study

University of Minnesota Ph.D. dissertation. August 2015. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); x, 116 pages.The digital world is expanding exponentially because of the growth of various applications in domains including scientific fields, enterprise environment and internet services. Importantly, these applications have drastically different storage requirements including parallel I/O performance and storage capacity. Various technologies have been developed in order to better satisfy different storage requirements. I/O middleware software, parallel file systems and storage arrays are developed to improve I/O performance by increasing I/O parallelism at different levels. New storage media and data recording technologies such as shingled magnetic recording (SMR) are also developed to increase the storage capacity. This work focuses on improving existing technologies and designing new schemes based on I/O workload characterizations in corresponding storage environments. The contributions of this work can be summarized into four pieces, two on improving parallel I/O performance and two on increasing storage capacity. First, we design a comprehensive parallel I/O workload characterization and generation framework (called PIONEER) which can be used to synthesize a particular parallel I/O workload with desired I/O characteristics or precisely emulate a High Performance Computing (HPC) application of interest. Second, we propose a non-intrusive I/O middleware (called IO-Engine) to automatically improve a given parallel I/O workload in Lustre which is a widely used HPC or parallel I/O system. IO-Engine can explore the correlations between different software layers in the deep I/O path, as well as workload patterns at runtime to transparently transform the workload patterns and tune related I/O parameters in the system. Third, we design several novel static address mapping schemes for shingled write disks (SWDs) to minimize the write amplification overhead in hard drives adopting SMR technology. Fourth, we propose a track-level shingled translation layer (T-STL) for SWDs with hybrid update strategy (in-place update plus out-of-place update). T-STL uses dynamic address mapping scheme and performs garbage collection operations by migrating selected disk tracks. This scheme can provider larger storage capacity and better overall performance with the same effective storage percentages when compared to the static address mapping schemes

University of Minnesota Digital Conservancy

Doctor of Philosophy

Author: Lin Xing
Publication venue: University of Utah
Publication date: 01/12/2015
Field of study

dissertationIn the past few years, we have seen a tremendous increase in digital data being generated. By 2011, storage vendors had shipped 905 PB of purpose-built backup appliances. By 2013, the number of objects stored in Amazon S3 had reached 2 trillion. Facebook had stored 20 PB of photos by 2010. All of these require an efficient storage solution. To improve space efficiency, compression and deduplication are being widely used. Compression works by identifying repeated strings and replacing them with more compact encodings while deduplication partitions data into fixed-size or variable-size chunks and removes duplicate blocks. While we have seen great improvements in space efficiency from these two approaches, there are still some limitations. First, traditional compressors are limited in their ability to detect redundancy across a large range since they search for redundant data in a fine-grain level (string level). For deduplication, metadata embedded in an input file changes more frequently, and this introduces more unnecessary unique chunks, leading to poor deduplication. Cloud storage systems suffer from unpredictable and inefficient performance because of interference among different types of workloads. This dissertation proposes techniques to improve the effectiveness of traditional compressors and deduplication in improving space efficiency, and a new IO scheduling algorithm to improve performance predictability and efficiency for cloud storage systems. The common idea is to utilize similarity. To improve the effectiveness of compression and deduplication, similarity in content is used to transform an input file into a compression- or deduplication-friendly format. We propose Migratory Compression, a generic data transformation that identifies similar data in a coarse-grain level (block level) and then groups similar blocks together. It can be used as a preprocessing stage for any traditional compressor. We find metadata have a huge impact in reducing the benefit of deduplication. To isolate the impact from metadata, we propose to separate metadata from data. Three approaches are presented for use cases with different constrains. For the commonly used tar format, we propose Migratory Tar: a data transformation and also a new tar format that deduplicates better. We also present a case study where we use deduplication to reduce storage consumption for storing disk images, while at the same time achieving high performance in image deployment. Finally, we apply the same principle of utilizing similarity in IO scheduling to prevent interference between random and sequential workloads, leading to efficient, consistent, and predictable performance for sequential workloads and a high disk utilization

The University of Utah: J. Willard Marriott Digital Library

Manufactured Home Energy Audit user`s manual

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Suorituskyky ja skaalautuvuus sensoridatan tallennuksessa

Author: Tötterman Paul
Publication venue
Publication date: 30/03/2015
Field of study

Modern artificial intelligence and machine learning applications build on analysis and training using large datasets. New research and development does not always start with existing big datasets, but accumulate data over time. The same storage solution does not necessarily cover the scale during the lifetime of the research, especially if scaling up from using common workgroup storage technologies. The storage infrastructure at ZenRobotics has grown using standard workgroup technologies. The current approach is starting to show its limits, while the storage growth is predicted to continue and accelerate. Successful capacity planning and expansion requires a better understanding of the patterns of the use of storage and its growth. We have examined the current storage architecture and stored data from different perspectives in order to gain a better understanding of the situation. By performing a number of experiments we determine key properties of the employed technologies. The combination of these factors allows us to make informed decisions about future storage solutions. Current usage patterns are in many ways inefficient and changes are needed in order to be able to work with larger volumes of data. Some changes would allow to scale the current architecture a bit further, but in order to scale horizontally instead of just vertically, there is a need to start designing for scalability in the future system architecture.Modernit tekoälyn ja koneoppimisen sovellukset perustuvat suurten tietomää- rien analyysin ja käyttöön opetusdatana. Suuren aineiston olemassaolo ei aina ole itsestäänselvää tutkimuksen tai tuotekehityksen alkaessa. Samat tallennus- ratkaisut eivät välttämättä pysty kattamaan skaalautumistarpeita tutkimuksen koko keston ajalta, varsinkaan jos lähtökohtana ovat laajassa käytössä olevat työryhmätallennusratkaisut. ZenRoboticsilla käytössä oleva tallennusinfrastruktuuri on kasvanut yleisiä työ- ryhmätallennusteknologioita käyttäen. Nykyisen lähestymistavan rajat alkavat tulla vastaan, kun taas tallennuskapasiteetin tarve näyttäisi kasvavan ja kasvun tahti kiihtyvän. Tallennuskapasiteetin laajentamisen suunnittelu ja laajennuksen toteuttaminen edellyttävät parempaa käyttötapojen ja kasvun ymmärrystä. Tämä diplomityö tutkii nykyistä tallennusarkkitehtuuria ja tallennettua dataa eri näkökulmista nykytilanteen parempaan hahmottamiseen tähdäten. Suoritetuilla mittauksilla selvitimme käytössä olevien teknologioiden oleellisimmat ominaisuu- det. Yhdessä näiden perusteella pystymme tekemään tietoisempia valintoja tulevia tallennusratkaisuja koskien. Nykyiset käyttötavat ovat monin tavoin tehottomia. Suurempien tietomäärien käsittelemisen mahdollistamiseksi on tehtävä muutoksia. Työ esittelee muuto- sehdotuksia, joilla olisi mahdollista skaalata nykyistä tallennusarkkitehtuuria hieman suuremmalle kapasiteetille. Horisontaalisen skaalautumisen mahdollista- miseksi vertikaalisen sijaan on kuitenkin otettava skaalautuminen huomioon koko järjestelmän arkkitehtuurin suunnittelussa

Aaltodoc Publication Archive