Search CORE

29 research outputs found

Towards Design and Analysis For High-Performance and Reliable SSDs

Author: Xia Qianbin
Publication venue: VCU Scholars Compass
Publication date: 01/01/2017
Field of study

NAND Flash-based Solid State Disks have many attractive technical merits, such as low power consumption, light weight, shock resistance, sustainability of hotter operation regimes, and extraordinarily high performance for random read access, which makes SSDs immensely popular and be widely employed in different types of environments including portable devices, personal computers, large data centers, and distributed data systems. However, current SSDs still suffer from several critical inherent limitations, such as the inability of in-place-update, asymmetric read and write performance, slow garbage collection processes, limited endurance, and degraded write performance with the adoption of MLC and TLC techniques. To alleviate these limitations, we propose optimizations from both specific outside applications layer and SSDs\u27 internal layer. Since SSDs are good compromise between the performance and price, so SSDs are widely deployed as second layer caches sitting between DRAMs and hard disks to boost the system performance. Due to the special properties of SSDs such as the internal garbage collection processes and limited lifetime, traditional cache devices like DRAM and SRAM based optimizations might not work consistently for SSD-based cache. Therefore, for the outside applications layer, our work focus on integrating the special properties of SSDs into the optimizations of SSD caches. Moreover, our work also involves the alleviation of the increased Flash write latency and ECC complexity due to the adoption of MLC and TLC technologies by analyzing the real work workloads

Coset Coding to Extend the Lifetime of Non-Volatile Memory

Author: Jacobvitz Adam
Publication venue
Publication date
Field of study

Modern computing systems are increasingly integrating both Phase Change Memory (PCM) and Flash memory technologies into computer systems being developed today, yet the lifetime of these technologies is limited by the number of times cells are written. Due to their limited lifetime, PCM and Flash may wear-out before other parts of the system. The objective of this dissertation is to increase the lifetime of memory locations composed of either PCM or Flash cells using coset coding. For PCM, we extend memory lifetime by using coset coding to reduce the number of bit-flips per write compared to un-coded writes. Flash program/erase operation cycle degrades page lifetime; we extend the lifetime of Flash memory cells by using coset coding to re-program a page multiple times without erasing. We then show how coset coding can be integrated into Flash solid state drives.We ran simulations to evaluate the effectiveness of using coset coding to extend PCM and Flash lifetime. We simulated writes to PCM and found that in our simulations coset coding can be used to increase PCM lifetime by up to 3x over writing un-coded data directly to the memory location. We extended the lifetime of Flash using coset coding to re-write pages without an intervening erase and were able to re-write a single Flash page using coset coding more times than when writing un-coded data or using prior coding work for the same area overhead. We also found in our simulations that using coset coding in a Flash SSD results in higher lifetime for a given area overhead compared to un-coded writes.Dissertatio

Performance Characterization of NVMe Flash Devices with Zoned Namespaces (ZNS)

Author: Bjørling Matias
Chandrasekaran Balakrishnan
Doekemeijer Krijn
Tehrany Nick
Trivedi Animesh
Publication venue
Publication date: 29/10/2023
Field of study

The recent emergence of NVMe flash devices with Zoned Namespace support, ZNS SSDs, represents a significant new advancement in flash storage. ZNS SSDs introduce a new storage abstraction of append-only zones with a set of new I/O (i.e., append) and management (zone state machine transition) commands. With the new abstraction and commands, ZNS SSDs offer more control to the host software stack than a non-zoned SSD for flash management, which is known to be complex (because of garbage collection, scheduling, block allocation, parallelism management, overprovisioning). ZNS SSDs are, consequently, gaining adoption in a variety of applications (e.g., file systems, key-value stores, and databases), particularly latency-sensitive big-data applications. Despite this enthusiasm, there has yet to be a systematic characterization of ZNS SSD performance with its zoned storage model abstractions and I/O operations. This work addresses this crucial shortcoming. We report on the performance features of a commercially available ZNS SSD (13 key observations), explain how these features can be incorporated into publicly available state-of-the-art ZNS emulators, and recommend guidelines for ZNS SSD application developers. All artifacts (code and data sets) of this study are publicly available at https://github.com/stonet-research/NVMeBenchmarks.Comment: Paper to appear in the https://clustercomp.org/2023/program

arXiv.org e-Print Archive

A Scalable Flash-Based Hardware Architecture for the Hierarchical Temporal Memory Spatial Pooler

Author: Streat Lennard G
Publication venue: RIT Scholar Works
Publication date: 01/05/2016
Field of study

Hierarchical temporal memory (HTM) is a biomimetic machine learning algorithm focused upon modeling the structural and algorithmic properties of the neocortex. It is comprised of two components, realizing pattern recognition of spatial and temporal data, respectively. HTM research has gained momentum in recent years, leading to both hardware and software exploration of its algorithmic formulation. Previous work on HTM has centered on addressing performance concerns; however, the memory-bound operation of HTM presents significant challenges to scalability. In this work, a scalable flash-based storage processor unit, Flash-HTM (FHTM), is presented along with a detailed analysis of its potential scalability. FHTM leverages SSD flash technology to implement the HTM cortical learning algorithm spatial pooler. The ability for FHTM to scale with increasing model complexity is addressed with respect to design footprint, memory organization, and power efficiency. Additionally, a mathematical model of the hardware is evaluated against the MNIST dataset, yielding 91.98% classification accuracy. A fully custom layout is developed to validate the design in a TSMC 180nm process. The area and power footprints of the spatial pooler are 30.538mm2 and 5.171mW, respectively. Storage processor units have the potential to be viable platforms to support implementations of HTM at scale

RIT Scholar Works

Signal Processing for Caching Networks and Non-volatile Memories

Author: Luo Tianqiong
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

The recent information explosion has created a pressing need for faster and more reliable data storage and transmission schemes. This thesis focuses on two systems: caching networks and non-volatile storage systems. It proposes network protocols to improve the efficiency of information delivery and signal processing schemes to reduce errors at the physical layer as well. This thesis first investigates caching and delivery strategies for content delivery networks. Caching has been investigated as a useful technique to reduce the network burden by prefetching some contents during o˙-peak hours. Coded caching [1] proposed by Maddah-Ali and Niesen is the foundation of our algorithms and it has been shown to be a useful technique which can reduce peak traffic rates by encoding transmissions so that different users can extract different information from the same packet. Content delivery networks store information distributed across multiple servers, so as to balance the load and avoid unrecoverable losses in case of node or disk failures. On one hand, distributed storage limits the capability of combining content from different servers into a single message, causing performance losses in coded caching schemes. But, on the other hand, the inherent redundancy existing in distributed storage systems can be used to improve the performance of those schemes through parallelism. This thesis proposes a scheme combining distributed storage of the content in multiple servers and an efficient coded caching algorithm for delivery to the users. This scheme is shown to reduce the peak transmission rate below that of state-of-the-art algorithms

Improving Storage Performance with Non-Volatile Memory-based Caching Systems

Author: Fan Ziqi
Publication venue
Publication date: 01/04/2017
Field of study

University of Minnesota Ph.D. dissertation. April 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 104 pages.With the rapid development of new types of non-volatile memory (NVRAM), e.g., 3D Xpoint, NVDIMM, and STT-MRAM, these technologies have been or will be integrated into current computer systems to work together with traditional DRAM. Compared with DRAM, which can cause data loss when the power fails or the system crashes, NVRAM's non-volatile nature makes it a better candidate as caching material. In the meantime, storage performance needs to keep up to process and accommodate the rapidly generated amounts of data around the world (a.k.a the big data problem). Throughout my Ph.D. research, I have been focusing on building novel NVRAM-based caching systems to provide cost-effective ways to improve storage system performance. To show the benefits of designing novel NVRAM-based caching systems, I target four representative storage devices and systems: solid state drives (SSDs), hard disk drives (HDDs), disk arrays, and high-performance computing (HPC) parallel file systems (PFSs). For SSDs, to mitigate their wear out problem and extend their lifespan, we propose two NVRAM-based buffer cache policies which can work together in different layers to maximally reduce SSD write traffic: a main memory buffer cache design named Hierarchical Adaptive Replacement Cache (H-ARC) and an internal SSD write buffer design named Write Traffic Reduction Buffer (WRB). H-ARC considers four factors (dirty, clean, recency, and frequency) to reduce write traffic and improve cache hit ratios in the host. WRB reduces block erasures and write traffic further inside an SSD by effectively exploiting temporal and spatial localities. For HDDs, to exploit their fast sequential access speed to improve I/O throughput, we propose a buffer cache policy, named I/O-Cache, that regroups and synchronizes long sets of consecutive dirty pages to take advantage of HDDs' fast sequential access speed and the non-volatile property of NVRAM. In addition, our new policy can dynamically separate the whole cache into a dirty cache and a clean cache, according to the characteristics of the workload, to decrease storage writes. For disk arrays, although numerous cache policies have been proposed, most are either targeted at main memory buffer caches or manage NVRAM as write buffers and separately manage DRAM as read caches. To the best of our knowledge, cooperative hybrid volatile and non-volatile memory buffer cache policies specifically designed for storage systems using newer NVRAM technologies have not been well studied. Based on our elaborate study of storage server block I/O traces, we propose a novel cooperative HybrId NVRAM and DRAM Buffer cACHe polIcy for storage arrays, named Hibachi. Hibachi treats read cache hits and write cache hits differently to maximize cache hit rates and judiciously adjusts the clean and the dirty cache sizes to capture workloads' tendencies. In addition, it converts random writes to sequential writes for high disk write throughput and further exploits storage server I/O workload characteristics to improve read performance. For modern complex HPC systems (e.g., supercomputers), data generated during checkpointing are bursty and so dominate HPC I/O traffic that relying solely on PFSs will slow down the whole HPC system. In order to increase HPC checkpointing speed, we propose an NVRAM-based burst buffer coordination system for PFSs, named collaborative distributed burst buffer (CDBB). Inspired by our observations of HPC application execution patterns and experimentations on HPC clusters, we design CDBB to coordinate all the available burst buffers, based on their priorities and states, to help overburdened burst buffers and maximize resource utilization