2,306 research outputs found

    A Modern Primer on Processing in Memory

    Full text link
    Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0398

    ์ดˆ๊ณ ์šฉ๋Ÿ‰ ์†”๋ฆฌ๋“œ ์Šคํ…Œ์ด๋“œ ๋“œ๋ผ์ด๋ธŒ๋ฅผ ์œ„ํ•œ ์‹ ๋ขฐ์„ฑ ํ–ฅ์ƒ ๋ฐ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๊ธฐ์ˆ 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊น€์ง€ํ™.The development of ultra-large NAND flash storage devices (SSDs) is recently made possible by NAND flash memory semiconductor process scaling and multi-leveling techniques, and NAND package technology, which enables continuous increasing of storage capacity by mounting many NAND flash memory dies in an SSD. As the capacity of an SSD increases, the total cost of ownership of the storage system can be reduced very effectively, however due to limitations of ultra-large SSDs in reliability and performance, there exists some obstacles for ultra-large SSDs to be widely adopted. In order to take advantage of an ultra-large SSD, it is necessary to develop new techniques to improve these reliability and performance issues. In this dissertation, we propose various optimization techniques to solve the reliability and performance issues of ultra-large SSDs. In order to overcome the optimization limitations of the existing approaches, our techniques were designed based on various characteristic evaluation results of NAND flash devices and field failure characteristics analysis results of real SSDs. We first propose a low-stress erase technique for the purpose of reducing the characteristic deviation between wordlines (WLs) in a NAND flash block. By reducing the erase stress on weak WLs, it effectively slows down NAND degradation and improves NAND endurance. From the NAND evaluation results, the conditions that can most effectively guard the weak WLs are defined as the gerase mode. In addition, considering the user workload characteristics, we propose a technique to dynamically select the optimal gerase mode that can maximize the lifetime of the SSD. Secondly, we propose an integrated approach that maximizes the efficiency of copyback operations to improve performance while not compromising data reliability. Based on characterization using real 3D TLC flash chips, we propose a novel per-block error propagation model under consecutive copyback operations. Our model significantly increases the number of successive copybacks by exploiting the aging characteristics of NAND blocks. Furthermore, we devise a resource-efficient error management scheme that can handle successive copybacks where pages move around multiple blocks with different reliability. By utilizing proposed copyback operation for internal data movement, SSD performance can be effectively improved without any reliability issues. Finally, we propose a new recovery scheme, called reparo, for a RAID storage system with ultra-large SSDs. Unlike the existing RAID recovery schemes, reparo repairs a failed SSD at the NAND die granularity without replacing it with a new SSD, thus avoiding most of the inter-SSD data copies during a RAID recovery step. When a NAND die of an SSD fails, reparo exploits a multi-core processor of the SSD controller to identify failed LBAs from the failed NAND die and to recover data from the failed LBAs. Furthermore, reparo ensures no negative post-recovery impact on the performance and lifetime of the repaired SSD. In order to evaluate the effectiveness of the proposed techniques, we implemented them in a storage device prototype, an open NAND flash storage device development environment, and a real SSD environment. And their usefulness was verified using various benchmarks and I/O traces collected the from real-world applications. The experiment results show that the reliability and performance of the ultra-large SSD can be effectively improved through the proposed techniques.๋ฐ˜๋„์ฒด ๊ณต์ •์˜ ๋ฏธ์„ธํ™”, ๋‹ค์น˜ํ™” ๊ธฐ์ˆ ์— ์˜ํ•ด์„œ ์ง€์†์ ์œผ๋กœ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋Š” ๋‹จ์œ„ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ๋ฉ”๋ชจ๋ฆฌ์™€ ํ•˜๋‚˜์˜ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ๊ธฐ๋ฐ˜ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ ๋‚ด์— ์ˆ˜ ๋งŽ์€ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ๋ฉ”๋ชจ๋ฆฌ ๋‹ค์ด๋ฅผ ์‹ค์žฅํ•  ์ˆ˜ ์žˆ๊ฒŒํ•˜๋Š” ๋‚ธ๋“œ ํŒจํ‚ค์ง€ ๊ธฐ์ˆ ๋กœ ์ธํ•ด ํ•˜๋“œ๋””์Šคํฌ๋ณด๋‹ค ํ›จ์”ฌ ๋” ํฐ ์ดˆ๊ณ ์šฉ๋Ÿ‰์˜ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ์ €์žฅ์žฅ์น˜์˜ ๊ฐœ๋ฐœ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค. ํ”Œ๋ž˜์‰ฌ ์ €์žฅ์žฅ์น˜์˜ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์˜ ์ด ์†Œ์œ ๋น„์šฉ์„ ์ค„์ด๋Š”๋ฐ ๋งค์šฐ ํšจ๊ณผ์ ์ธ ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋‚˜, ์‹ ๋ขฐ์„ฑ ๋ฐ ์„ฑ๋Šฅ์˜ ์ธก๋ฉด์—์„œ์˜ ํ•œ๊ณ„๋กœ ์ธํ•ด์„œ ์ดˆ๊ณ ์šฉ๋Ÿ‰ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ์ €์žฅ์žฅ์น˜๊ฐ€ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š”๋ฐ ์žˆ์–ด์„œ ์žฅ์• ๋ฌผ๋กœ ์ž‘์šฉํ•˜๊ณ  ์žˆ๋‹ค. ์ดˆ๊ณ ์šฉ๋Ÿ‰ ์ €์žฅ์žฅ์น˜์˜ ์žฅ์ ์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด๋Ÿฌํ•œ ์‹ ๋ขฐ์„ฑ ๋ฐ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ธฐ๋ฒ•์˜ ๊ฐœ๋ฐœ์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ดˆ๊ณ ์šฉ๋Ÿ‰ ๋‚ธ๋“œ๊ธฐ๋ฐ˜ ์ €์žฅ์žฅ์น˜(SSD)์˜ ๋ฌธ์ œ์ ์ธ ์„ฑ๋Šฅ ๋ฐ ์‹ ๋ขฐ์„ฑ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์ตœ์ ํ™” ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ๊ธฐ๋ฒ•๋“ค์˜ ์ตœ์ ํ™” ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ, ์šฐ๋ฆฌ์˜ ๊ธฐ์ˆ ์€ ์‹ค์ œ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ์†Œ์ž์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ํŠน์„ฑ ํ‰๊ฐ€ ๊ฒฐ๊ณผ์™€ SSD์˜ ํ˜„์žฅ ๋ถˆ๋Ÿ‰ ํŠน์„ฑ ๋ถ„์„๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ณ ์•ˆ๋˜์—ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด์„œ ๋‚ธ๋“œ์˜ ํ”Œ๋ž˜์‰ฌ ํŠน์„ฑ๊ณผ SSD, ๊ทธ๋ฆฌ๊ณ  ํ˜ธ์ŠคํŠธ ์‹œ์Šคํ…œ์˜ ๋™์ž‘ ํŠน์„ฑ์„ ๊ณ ๋ คํ•œ ์„ฑ๋Šฅ ๋ฐ ์‹ ๋ขฐ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ๋ถˆ๋ก๋‚ด์˜ ํŽ˜์ด์ง€๋“ค๊ฐ„์˜ ํŠน์„ฑํŽธ์ฐจ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ ๋™์ ์ธ ์†Œ๊ฑฐ ์ŠคํŠธ๋ ˆ์Šค ๊ฒฝ๊ฐ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ ๋‚ธ๋“œ ๋ธ”๋ก์˜ ๋‚ด๊ตฌ์„ฑ์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด์„œ ํŠน์„ฑ์ด ์•ฝํ•œ ํŽ˜์ด์ง€๋“ค์— ๋Œ€ํ•ด์„œ ๋” ์ ์€ ์†Œ๊ฑฐ ์ŠคํŠธ๋ ˆ์Šค๊ฐ€ ์ธ๊ฐ€ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋‚ธ๋“œ ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋กœ ๋ถ€ํ„ฐ ์†Œ๊ฑฐ ์ŠคํŠธ๋ ˆ์Šค ๊ฒฝ๊ฐ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ๋˜ํ•œ ์‚ฌ์šฉ์ž ์›Œํฌ๋กœ๋“œ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ, ์†Œ๊ฑฐ ์ŠคํŠธ๋ ˆ์Šค ๊ฒฝ๊ฐ ๊ธฐ๋ฒ•์˜ ํšจ๊ณผ๊ฐ€ ์ตœ๋Œ€ํ™” ๋  ์ˆ˜ ์žˆ๋Š” ์ตœ์ ์˜ ๊ฒฝ๊ฐ ์ˆ˜์ค€์„ ๋™์ ์œผ๋กœ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด์„œ ๋‚ธ๋“œ ๋ธ”๋ก์„ ์—ดํ™”์‹œํ‚ค๋Š” ์ฃผ์š” ์›์ธ์ธ ์†Œ๊ฑฐ ๋™์ž‘์„ ํšจ์œจ์ ์œผ๋กœ ์ œ์–ดํ•จ์œผ๋กœ์จ ์ €์žฅ์žฅ์น˜์˜ ์ˆ˜๋ช…์„ ํšจ๊ณผ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ๋‘˜์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ณ ์šฉ๋Ÿ‰ SSD์—์„œ์˜ ๋‚ด๋ถ€ ๋ฐ์ดํ„ฐ ์ด๋™์œผ๋กœ ์ธํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๋ฌธ์ œ๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ์˜ ์ œํ•œ๋œ ์นดํ”ผ๋ฐฑ(copyback) ๋ช…๋ น์„ ํ™œ์šฉํ•˜๋Š” ์ ์‘ํ˜• ๊ธฐ๋ฒ•์ธ rCPB์„ ์ œ์•ˆํ•œ๋‹ค. rCPB์€ Copyback ๋ช…๋ น์˜ ํšจ์œจ์„ฑ์„ ๊ทน๋Œ€ํ™” ํ•˜๋ฉด์„œ๋„ ๋ฐ์ดํ„ฐ ์‹ ๋ขฐ์„ฑ์— ๋ฌธ์ œ๊ฐ€ ์—†๋„๋ก ๋‚ธ๋“œ์˜ ๋ธ”๋Ÿญ์˜ ๋…ธํ™”ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•œ ์ƒˆ๋กœ์šด copyback ์˜ค๋ฅ˜ ์ „ํŒŒ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœํ•œ๋‹ค. ์ด์—๋”ํ•ด, ์‹ ๋ขฐ์„ฑ์ด ๋‹ค๋ฅธ ๋ธ”๋Ÿญ๊ฐ„์˜ copyback ๋ช…๋ น์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ด๋™์„ ๋ฌธ์ œ์—†์ด ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž์› ํšจ์œจ์ ์ธ ์˜ค๋ฅ˜ ๊ด€๋ฆฌ ์ฒด๊ณ„๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด์„œ ์‹ ๋ขฐ์„ฑ์— ๋ฌธ์ œ๋ฅผ ์ฃผ์ง€ ์•Š๋Š” ์ˆ˜์ค€์—์„œ copyback์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜์—ฌ ๋‚ด๋ถ€ ๋ฐ์ดํ„ฐ ์ด๋™์„ ์ตœ์ ํ™” ํ•จ์œผ๋กœ์จ SSD์˜ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ดˆ๊ณ ์šฉ๋Ÿ‰ SSD์—์„œ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ์˜ ๋‹ค์ด(die) ๋ถˆ๋Ÿ‰์œผ๋กœ ์ธํ•œ ๋ ˆ์ด๋“œ(redundant array of independent disks, RAID) ๋ฆฌ๋นŒ๋“œ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ตœ์†Œํ™” ํ•˜๊ธฐ์œ„ํ•œ ์ƒˆ๋กœ์šด RAID ๋ณต๊ตฌ ๊ธฐ๋ฒ•์ธ reparo๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Reparo๋Š” SSD์— ๋Œ€ํ•œ ๊ต์ฒด์—†์ด SSD์˜ ๋ถˆ๋Ÿ‰ die์— ๋Œ€ํ•ด์„œ๋งŒ ๋ณต๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ๋ณต๊ตฌ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ตœ์†Œํ™”ํ•œ๋‹ค. ๋ถˆ๋Ÿ‰์ด ๋ฐœ์ƒํ•œ die์˜ ๋ฐ์ดํ„ฐ๋งŒ ์„ ๋ณ„์ ์œผ๋กœ ๋ณต๊ตฌํ•จ์œผ๋กœ์จ ๋ณต๊ตฌ ๊ณผ์ •์˜ ๋ฆฌ๋นŒ๋“œ ํŠธ๋ž˜ํ”ฝ์„ ์ตœ์†Œํ™”ํ•˜๋ฉฐ, SSD ๋‚ด๋ถ€์˜ ๋ณ‘๋ ฌ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ถˆ๋Ÿ‰ die ๋ณต๊ตฌ ์‹œ๊ฐ„์„ ํšจ๊ณผ์ ์œผ๋กœ ๋‹จ์ถ•ํ•œ๋‹ค. ๋˜ํ•œ die ๋ถˆ๋Ÿ‰์œผ๋กœ ์ธํ•œ ๋ฌผ๋ฆฌ์  ๊ณต๊ฐ„๊ฐ์†Œ์˜ ๋ถ€์ž‘์šฉ์„ ์ตœ์†Œํ™” ํ•จ์œผ๋กœ์จ ๋ณต๊ตฌ ์ดํ›„์˜ ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฐ ์ˆ˜๋ช…์˜ ๊ฐ์†Œ ๋ฌธ์ œ๊ฐ€ ์—†๋„๋ก ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•๋“ค์€ ์ €์žฅ์žฅ์น˜ ํ”„๋กœํ† ํƒ€์ž… ๋ฐ ๊ณต๊ฐœ ๋‚ธ๋“œ ํ”Œ๋ž˜์‰ฌ ์ €์žฅ์žฅ์น˜ ๊ฐœ๋ฐœํ™˜๊ฒฝ, ๊ทธ๋ฆฌ๊ณ  ์‹ค์žฅ SSDํ™˜๊ฒฝ์— ๊ตฌํ˜„๋˜์—ˆ์œผ๋ฉฐ, ์‹ค์ œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์„ ๋ชจ์‚ฌํ•œ ๋‹ค์–‘ํ•œ ๋ฒคํŠธ๋งˆํฌ ๋ฐ ์‹ค์ œ I/O ํŠธ๋ ˆ์ด์Šค๋“ค์„ ์ด์šฉํ•˜์—ฌ ๊ทธ ์œ ์šฉ์„ฑ์„ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•๋“ค์„ ํ†ตํ•ด์„œ ์ดˆ๊ณ ์šฉ๋Ÿ‰ SSD์˜ ์‹ ๋ขฐ์„ฑ ๋ฐ ์„ฑ๋Šฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.I Introduction 1 1.1 Motivation 1 1.2 Dissertation Goals 3 1.3 Contributions 5 1.4 Dissertation Structure 8 II Background 11 2.1 Overview of 3D NAND Flash Memory 11 2.2 Reliability Management in NAND Flash Memory 14 2.3 UL SSD architecture 15 2.4 Related Work 17 2.4.1 NAND endurance optimization by utilizing page characteristics difference 17 2.4.2 Performance optimizations using copyback operation 18 2.4.3 Optimizations for RAID Rebuild 19 2.4.4 Reliability improvement using internal RAID 20 III GuardedErase: Extending SSD Lifetimes by Protecting Weak Wordlines 22 3.1 Reliability Characterization of a 3D NAND Flash Block 22 3.1.1 Large Reliability Variations Among WLs 22 3.1.2 Erase Stress on Flash Reliability 26 3.2 GuardedErase: Design Overview and its Endurance Model 28 3.2.1 Basic Idea 28 3.2.2 Per-WL Low-Stress Erase Mode 31 3.2.3 Per-Block Erase Modes 35 3.3 Design and Implementation of LongFTL 39 3.3.1 Overview 39 3.3.2 Weak WL Detector 40 3.3.3 WAF Monitor 42 3.3.4 GErase Mode Selector 43 3.4 Experimental Results 46 3.4.1 Experimental Settings 46 3.4.2 Lifetime Improvement 47 3.4.3 Performance Overhead 49 3.4.4 Effectiveness of Lowest Erase Relief Ratio 50 IV Improving SSD Performance Using Adaptive Restricted- Copyback Operations 52 4.1 Motivations 52 4.1.1 Data Migration in Modern SSD 52 4.1.2 Need for Block Aging-Aware Copyback 53 4.2 RCPB: Copyback with a Limit 55 4.2.1 Error-Propagation Characteristics 55 4.2.2 RCPB Operation Model 58 4.3 Design and Implementation of rcFTL 59 4.3.1 EPM module 60 4.3.2 Data Migration Mode Selection 64 4.4 Experimental Results 65 4.4.1 Experimental Setup 65 4.4.2 Evaluation Results 66 V Reparo: A Fast RAID Recovery Scheme for Ultra- Large SSDs 70 5.1 SSD Failures: Causes and Characteristics 70 5.1.1 SSD Failure Types 70 5.1.2 SSD Failure Characteristics 72 5.2 Impact of UL SSDs on RAID Reliability 74 5.3 RAID Recovery using Reparo 77 5.3.1 Overview of Reparo 77 5.4 Cooperative Die Recovery 82 5.4.1 Identifier: Parallel Search of Failed LBAs 82 5.4.2 Handler: Per-Core Space Utilization Adjustment 83 5.5 Identifier Acceleration Using P2L Mapping Information 89 5.5.1 Page-level P2L Entrustment to Neighboring Die 90 5.5.2 Block-level P2L Entrustment to Neighboring Die 92 5.5.3 Additional Considerations for P2L Entrustment 94 5.6 Experimental Results 95 5.6.1 Experimental Settings 95 5.6.2 Experimental Results 97 VI Conclusions 109 6.1 Summary 109 6.2 Future Work 111 6.2.1 Optimization with Accurate WAF Prediction 111 6.2.2 Maximizing Copyback Threshold 111 6.2.3 Pre-failure Detection 112๋ฐ•

    Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

    Full text link
    Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.Comment: This is an extended and updated version of a paper published in IEEE Micro, pp. 1-14, 29 Aug. 2022. arXiv admin note: text overlap with arXiv:2109.1432

    Understanding and Optimizing Flash-based Key-value Systems in Data Centers

    Get PDF
    Flash-based key-value systems are widely deployed in todayโ€™s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations. In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs

    TACKLING PERFORMANCE AND SECURITY ISSUES FOR CLOUD STORAGE SYSTEMS

    Get PDF
    Building data-intensive applications and emerging computing paradigm (e.g., Machine Learning (ML), Artificial Intelligence (AI), Internet of Things (IoT) in cloud computing environments is becoming a norm, given the many advantages in scalability, reliability, security and performance. However, under rapid changes in applications, system middleware and underlying storage device, service providers are facing new challenges to deliver performance and security isolation in the context of shared resources among multiple tenants. The gap between the decades-old storage abstraction and modern storage device keeps widening, calling for software/hardware co-designs to approach more effective performance and security protocols. This dissertation rethinks the storage subsystem from device-level to system-level and proposes new designs at different levels to tackle performance and security issues for cloud storage systems. In the first part, we present an event-based SSD (Solid State Drive) simulator that models modern protocols, firmware and storage backend in detail. The proposed simulator can capture the nuances of SSD internal states under various I/O workloads, which help researchers understand the impact of various SSD designs and workload characteristics on end-to-end performance. In the second part, we study the security challenges of shared in-storage computing infrastructures. Many cloud providers offer isolation at multiple levels to secure data and instance, however, security measures in emerging in-storage computing infrastructures are not studied. We first investigate the attacks that could be conducted by offloaded in-storage programs in a multi-tenancy cloud environment. To defend against these attacks, we build a lightweight Trusted Execution Environment, IceClave to enable security isolation between in-storage programs and internal flash management functions. We show that while enforcing security isolation in the SSD controller with minimal hardware cost, IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.4x better performance than the conventional host-based trusted computing approach. In the third part, we investigate the performance interference problem caused by other tenants' I/O flows. We demonstrate that I/O resource sharing can often lead to performance degradation and instability. The block device abstraction fails to expose SSD parallelism and pass application requirements. To this end, we propose a software/hardware co-design to enforce performance isolation by bridging the semantic gap. Our design can significantly improve QoS (Quality of Service) by reducing throughput penalties and tail latency spikes. Lastly, we explore more effective I/O control to address contention in the storage software stack. We illustrate that the state-of-the-art resource control mechanism, Linux cgroups is insufficient for controlling I/O resources. Inappropriate cgroup configurations may even hurt the performance of co-located workloads under memory intensive scenarios. We add kernel support for limiting page cache usage per cgroup and achieving I/O proportionality

    ACCELERATING STORAGE APPLICATIONS WITH EMERGING KEY VALUE STORAGE DEVICES

    Get PDF
    With the continuous data explosion in the big data era, traditional software and hardware stack are facing unprecedented challenges on how to operate on such data scale. Thus, designing new architectures and efficient systems for data oriented applications has become increasingly critical. This motivates us to re-think of the conventional storage system design and re-architect both software and hardware to meet the challenges of scale. Besides the fast growth of data volume, the increasing demand on storage applications such as video streaming, data analytics are pushing high performance flash based storage devices to replace the traditional spinning disks. Such all-flash era increase the data reliability concerns due to the endurance problem of flash devices. Key-value stores (KVS) are important storage infrastructure to handle the fast growing unstructured data and have been widely deployed in a variety of scale-out enterprise applications such as online retail, big data analytic, social networks, etc. How to efficiently manage data redundancy for key-value stores to provide data reliability, how to efficiently support range query for key-value stores to accelerate analytic oriented applications under emerging key-value store system architecture become an important research problem. In this research, we focus on how to design new software hardware architectures for the keyvalue store applications to provide reliability and improve query performance. In order to address the different issues identified in this dissertation, we propose to employ a logical key management layer, a thin layer above the KV devices that maps logical keys into phsyical keys on the devices. We show how such a layer can enable multiple solutions to improve the performance and reliability of KVSSD based storage systems. First, we present KVRAID, a high performance, write efficient erasure coding management scheme on emerging key-value SSDs. The core innovation of KVRAID is to propose a logical key management layer that maps logical keys to physical keys to efficiently pack similar size KV objects and dynamically manage the membership of erasure coding groups. Unlike existing schemes which manage erasure codes on the block level, KVRAID manages the erasure codes on the KV object level. In order to achieve better storage efficiency for variable sized objects, KVRAID predefines multiple fixed sizes (slabs) according to the object size distribution for the erasure code. KVRAID uses a logical to physical key conversion to pack the KV objects of similar size into a parity group. KVRAID uses a lazy deletion mechanism with a garbage collector for object updates. Our experiments show that in 100% put case, KVRAID outperforms software block RAID by 18x in case of throughput and reduces 15x write amplification (WAF) with only ~5% CPU utilization. In a mixed update/get workloads, KVRAID achieves ~4x better throughput with ~23% CPU utilization and reduces the storage overhead and WAF by 3.6x and 11.3x in average respectively. Second, we present KVRangeDB, an ordered log structure tree based key index that supports range queries on a hash-based KVSSD. In addition, we propose to pack smaller application records into a larger physical record on the device through the logical key management layer. We compared the performance of KVRangeDB against RocksDB implementation on KVSSD and stateof- art software KV-store Wisckey on block device, on three types of real world applications of cloud-serving workloads, TABLEFS filesystem and time-series databases. For cloud serving applications, KVRangeDB achieves 8.3x and 1.7x better 99.9% write tail latency respectively compared to RocksDB implementation on KV-SSD and Wisckey on block SSD. On the query side, KVrangeDB only performs worse for those very long scans, but provides fast point queries and closed range queries. The experiments on TABLEFS demonstrate that using KVRangeDB for metadata indexing can boost the performance by a factor of ~6.3x in average and reduce ~3.9x CPU cost for four metadata-intensive workloads compared to RocksDB implementation on KVSSD. Compared toWisckey, KVRangeDB improves performance by ~2.6x in average and reduces ~1.7x CPU usage. Third, we propose a generic FPGA accelerator for emerging Minimum Storage Regenerating (MSR) codes encoding/decoding which maximizes the computation parallelism and minimizes the data movement between off-chip DRAM and the on-chip SRAM buffers. To demonstrate the efficiency of our proposed accelerator, we implemented the encoding/decoding algorithms for a specific MSR code called Zigzag code on Xilinx VCU1525 acceleration card. Our evaluation shows our proposed accelerator can achieve ~2.4-3.1x better throughput and ~4.2-5.7x better power efficiency compared to the state-of-art multi-core CPU implementation and ~2.8-3.3x better throughput and ~4.2-5.3x better power efficiency compared to a modern GPU accelerato

    Dual-function nanoparticles enzymatically conjugated with a custom-made polyurethane hydrogel for chronic wound treatment

    Get PDF
    Hydrogels are attractive drug delivery systems with the potential to protect their cargo and control its release. In particular, hydrogels based on synthetic polymers are gaining increasing interest by virtue of their controllable chemistry, ease of modification, and reproducibility. Moreover, the presence of specific side chains and pending functional groups in the polymer structure allows for the conjugation of drugs and other compounds resulting in improved control over drug release. Enzymes that catalyse reactions in a very specific way could also be used to control the conjugation of compounds to the polymeric chains to improve reproducibility and biocompatibility of the conjugation process. This contribution describes an innovative system for drug delivery comprising a bioartificial supramolecular hydrogel based on a customised polyurethane and ฮฑ- cyclodextrins, and nanoparticles, for application in the treatment of chronic wounds. The system has the potential to reduce inflammation and eradicate infection by virtue of dual-function nanoparticles which incorporate cobalt as antimicrobial agent, and phenolated lignin as antioxidant. The nanoparticles are enzymatically conjugated to the hydrogel by means of the amine side groups exposed along the backbone of the ad-hoc synthesised polyurethane. The oxidase enzyme laccase is exploited to oxidize the phenol groups of lignin, to allow their interaction with the amines on the hydrogel. The effects of nanoparticles conjugation to the hydrogel are studied through gelification tests, stability tests, and rheology. Moreover, the release of nanoparticles from the hydrogel and their effects on patientsโ€™ wound fluids and against relevant bacterial strains are analysed in vitro
    • โ€ฆ
    corecore