5,873 research outputs found

    SSD์˜ ๊ธด ๊ผฌ๋ฆฌ ์ง€์—ฐ์‹œ๊ฐ„ ๋ฌธ์ œ ์™„ํ™”๋ฅผ ์œ„ํ•œ ๊ฐ•ํ™”ํ•™์Šต์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์œ ์Šน์ฃผ.NAND flash memory is widely used in a variety of systems, from realtime embedded systems to high-performance enterprise server systems. Flash memory has (1) erase-before-write (write-once) and (2) endurance problems. To handle the erase-before-write feature, apply a flash-translation layer (FTL). Currently, the page-level mapping method is mainly used to reduce the latency increase caused by the write-once and block erase characteristics of flash memory. Garbage collection (GC) is one of the leading causes of long-tail latency, which increases more than 100 times the average latency at 99th percentile. Therefore, real-time systems or quality-critical systems cannot satisfy given requirements such as QoS restrictions. As flash memory capacity increases, GC latency also tends to increase. This is because the block size (the number of pages included in one block) of the flash memory increases as the capacity of the flash memory increases. GC latency is determined by valid page copy and block erase time. Therefore, as block size increases, GC latency also increases. Especially, the block size gets increased from 2D to 3D NAND flash memory, e.g., 256 pages/block in 2D planner NAND flash memory and 768 pages/block in 3D NAND flash memory. Even in 3D NAND flash memory, the block size is expected to continue to increase. Thus, the long write latency problem incurred by GC can become more serious in 3D NAND flash memory-based storage. In this dissertation, we propose three versions of the novel GC scheduling method based on reinforcement learning. The purpose of this method is to reduce the long tail latency caused by GC by utilizing the idle time of the storage system. Also, we perform a quantitative analysis for the RL-assisted GC solution. RL-assisted GC scheduling technique was proposed which learns the storage access behavior online and determines the number of GC operations to exploit the idle time. We also presented aggressive methods, which helps in further reducing the long tail latency by aggressively performing fine-grained GC operations. We also proposed a technique that dynamically manages key states in RL-assisted GC to reduce the long-tail latency. This technique uses many fine-grained pieces of information as state candidates and manages key states that suitably represent the characteristics of the workload using a relatively small amount of memory resource. Thus, the proposed method can reduce the long-tail latency even further. In addition, we presented a Q-value prediction network that predicts the initial Q-value of a newly inserted state in the Q-table cache. The integrated solution of the Q-table cache and Q-value prediction network can exploit the short-term history of the system with a low-cost Q-table cache. It is also equipped with a small network called Q-value prediction network to make use of the long-term history and provide good Q-value initialization for the Q-table cache. The experiments show that our proposed method reduces by 25%-37% the long tail latency compared to the state-of-the-art method.๋‚ธ๋“œ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ๋Š” ์‹ค์‹œ๊ฐ„ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ๊ณ ์„ฑ๋Šฅ์˜ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ์„œ๋ฒ„ ์‹œ์Šคํ…œ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ์‹œ์Šคํ…œ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ ๋˜๊ณ  ์žˆ๋‹ค. ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ๋Š” (1) erase-before-write (write-once)์™€ (2) endurance ๋ฌธ์ œ๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค. Erase-before-write ํŠน์„ฑ์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด flash-translation layer (FTL)์„ ์ ์šฉ ํ•œ๋‹ค. ํ˜„์žฌ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ write-once ํŠน์„ฑ๊ณผ block eraseํŠน์„ฑ์œผ๋กœ ์ธํ•œ latency ์ฆ๊ฐ€๋ฅผ ๊ฐ์†Œ ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ page-level mapping๋ฐฉ์‹์ด ์ฃผ๋กœ ์‚ฌ์šฉ ๋œ๋‹ค. Garbage collection (GC)์€ 99th percentile์—์„œ ํ‰๊ท  ์ง€์—ฐ์‹œ๊ฐ„์˜ 100๋ฐฐ ์ด์ƒ ์ฆ๊ฐ€ํ•˜๋Š” long tail latency๋ฅผ ์œ ๋ฐœ์‹œํ‚ค๋Š” ์ฃผ์š” ์›์ธ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋”ฐ๋ผ์„œ ์‹ค์‹œ๊ฐ„ ์‹œ์Šคํ…œ์ด๋‚˜ quality-critical system์—์„œ๋Š” Quality of Service (QoS) ์ œํ•œ๊ณผ ๊ฐ™์€ ์ฃผ์–ด์ง„ ์š”๊ตฌ ์กฐ๊ฑด์„ ๋งŒ์กฑ ์‹œํ‚ฌ ์ˆ˜ ์—†๋‹ค. ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ GC latency๋„ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ์ด๊ฒƒ์€ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ฆ๊ฐ€ ํ•จ์— ๋”ฐ๋ผ ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ธ”๋ก ํฌ๊ธฐ (ํ•˜๋‚˜์˜ ๋ธ”๋ก์ด ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ํŽ˜์ด์ง€์˜ ์ˆ˜)๊ฐ€ ์ฆ๊ฐ€ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. GC latency๋Š” valid page copy์™€ block erase ์‹œ๊ฐ„์— ์˜ํ•ด ๊ฒฐ์ • ๋œ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ธ”๋ก ํฌ๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด, GC latency๋„ ์ฆ๊ฐ€ ํ•œ๋‹ค. ํŠนํžˆ, ์ตœ๊ทผ 2D planner ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ๋กœ ์ „ํ™˜๋จ์— ๋”ฐ๋ผ ๋ธ”๋ก ํฌ๊ธฐ๋Š” ์ฆ๊ฐ€ ํ•˜์˜€๋‹ค. ์‹ฌ์ง€์–ด 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ๋„ ๋ธ”๋ก ํฌ๊ธฐ๊ฐ€ ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ 3D vertical ํ”Œ๋ž˜์‹œ ๋ฉ”๋ชจ๋ฆฌ์—์„œ long tail latency ๋ฌธ์ œ๋Š” ๋”์šฑ ์‹ฌ๊ฐํ•ด ์ง„๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์šฐ๋ฆฌ๋Š” ๊ฐ•ํ™”ํ•™์Šต(Reinforcement learning, RL)์„ ์ด์šฉํ•œ ์„ธ ๊ฐ€์ง€ ๋ฒ„์ „์˜ ์ƒˆ๋กœ์šด GC scheduling ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ์ˆ ์˜ ๋ชฉ์ ์€ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์˜ idle ์‹œ๊ฐ„์„ ํ™œ์šฉํ•˜์—ฌ GC์— ์˜ํ•ด ๋ฐœ์ƒ๋œ long tail latency๋ฅผ ๊ฐ์†Œ ์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” RL-assisted GC ์†”๋ฃจ์…˜์„ ์œ„ํ•œ ์ •๋Ÿ‰ ๋ถ„์„ ํ•˜์˜€๋‹ค. ์šฐ๋ฆฌ๋Š” ์Šคํ† ๋ฆฌ์ง€์˜ access behavior๋ฅผ ์˜จ๋ผ์ธ์œผ๋กœ ํ•™์Šตํ•˜๊ณ , idle ์‹œ๊ฐ„์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GC operation์˜ ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” RL-assisted GC scheduling ๊ธฐ์ˆ ์„ ์ œ์•ˆ ํ•˜์˜€๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ์šฐ๋ฆฌ๋Š” ๊ณต๊ฒฉ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œ ํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ž‘์€ ๋‹จ์œ„์˜ GC operation๋“ค์„ ๊ณต๊ฒฉ์ ์œผ๋กœ ์ˆ˜ํ–‰ ํ•จ์œผ๋กœ์จ, long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋„๋ก ๋„์›€์„ ์ค€๋‹ค. ๋˜ํ•œ ์šฐ๋ฆฌ๋Š” long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ RL-assisted GC์˜ key state๋“ค์„ ๋™์ ์œผ๋กœ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” Q-table cache ๊ธฐ์ˆ ์„ ์ œ์•ˆ ํ•˜์˜€๋‹ค. ์ด ๊ธฐ์ˆ ์€ state ํ›„๋ณด๋กœ ๋งค์šฐ ๋งŽ์€ ์ˆ˜์˜ ์„ธ๋ฐ€ํ•œ ์ •๋ณด๋“ค์„ ์‚ฌ์šฉ ํ•˜๊ณ , ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์€ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ์ด์šฉํ•˜์—ฌ workload์˜ ํŠน์„ฑ์„ ์ ์ ˆํ•˜๊ฒŒ ํ‘œํ˜„ ํ•  ์ˆ˜ ์žˆ๋Š” key state๋“ค์„ ๊ด€๋ฆฌ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ long tail latency๋ฅผ ๋”์šฑ ๊ฐ์†Œ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ, ์šฐ๋ฆฌ๋Š” Q-table cache์— ์ƒˆ๋กญ๊ฒŒ ์ถ”๊ฐ€๋˜๋Š” state์˜ ์ดˆ๊ธฐ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” Q-value prediction network (QP Net)๋ฅผ ์ œ์•ˆ ํ•˜์˜€๋‹ค. Q-table cache์™€ QP Net์˜ ํ†ตํ•ฉ ์†”๋ฃจ์…˜์€ ์ € ๋น„์šฉ์˜ Q-table cache๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹จ๊ธฐ๊ฐ„์˜ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ํ™œ์šฉ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ด๊ฒƒ์€ QP Net์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ์ž‘์€ ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ์žฅ๊ธฐ๊ฐ„์˜ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Q-table cache์— ์ƒˆ๋กญ๊ฒŒ ์‚ฝ์ž…๋˜๋Š” state์— ๋Œ€ํ•ด ์ข‹์€ Q-value ์ดˆ๊ธฐ๊ฐ’์„ ์ œ๊ณตํ•œ๋‹ค. ์‹คํ—˜๊ฒฐ๊ณผ๋Š” ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด state-of-the-art ๋ฐฉ๋ฒ•์— ๋น„๊ตํ•˜์—ฌ 25%-37%์˜ long tail latency๋ฅผ ๊ฐ์†Œ ์‹œ์ผฐ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 System Level Tail Latency 6 2.2 Solid State Drive 10 2.2.1 Flash Storage Architecture and Garbage Collection 10 2.3 Reinforcement Learning 13 Chapter 3 Related Work 17 Chapter 4 Small Q-table based Solution to Reduce Long Tail Latency 23 4.1 Problem and Motivation 23 4.1.1 Long Tail Problem in Flash Storage Access Latency 23 4.1.2 Idle Time in Flash Storage 24 4.2 Design and Implementation 26 4.2.1 Solution Overview 26 4.2.2 RL-assisted Garbage Collection Scheduling 27 4.2.3 Aggressive RL-assisted Garbage Collection Scheduling 33 4.3 Evaluation 35 4.3.1 Evaluation Setup 35 4.3.2 Results and Discussion 39 Chapter 5 Q-table Cache to Exploit a Large Number of States at Small Cost 52 5.1 Motivation 52 5.2 Design and Implementation 56 5.2.1 Solution Overview 56 5.2.2 Dynamic Key States Management 61 5.3 Evaluation 67 5.3.1 Evaluation Setup 67 5.3.2 Results and Discussion 67 Chapter 6 Combining Q-table cache and Neural Network to Exploit both Long and Short-term History 73 6.1 Motivation and Problem 73 6.1.1 More State Information can Further Reduce Long Tail Latency 73 6.1.2 Locality Behavior of Workload 74 6.1.3 Zero Initialization Problem 75 6.2 Design and Implementation 77 6.2.1 Solution Overview 77 6.2.2 Q-table Cache for Action Selection 80 6.2.3 Q-value Prediction 83 6.3 Evaluation 87 6.3.1 Evaluation Setup 87 6.3.2 Storage-Intensive Workloads 89 6.3.3 Latency Comparison: Overall 92 6.3.4 Q-value Prediction Network Effects on Latency 97 6.3.5 Q-table Cache Analysis 110 6.3.6 Immature State Analysis 113 6.3.7 Miscellaneous Analysis 116 6.3.8 Multi Channel Analysis 121 Chapter 7 Conculsion and Future Work 138 7.1 Conclusion 138 7.2 Future Work 140 Bibliography 143 ๊ตญ๋ฌธ์ดˆ๋ก 154Docto

    Off-line Deduplication Method for Solid-State Disk Based on Hot and Cold Data

    Get PDF
    Solid-state disk (SSD) deduplication refers to the identification and deletion of duplicate data stored in an SSD. The reliability of SSDs is improved by deduplication. At present, the common data deduplication of SSDs is based on online data deduplication with Field Programmable Gate Array (FPGA) acceleration. The disadvantage is that FPGA, which has a complex structure. An off-line deduplication method for the SSD based on hot and cold data was proposed in this study to simplify the structure of an SSD deduplication, reduce the cost, and improve the efficiency of deduplication and access performance of SSDs. First, the wear-leveling algorithm was employed in the SSD to divide the data into cold and hot. Then, the corresponding fingerprint was generated for the cold data. Second, the fingerprint was compared, and the cold data with the same fingerprint were deleted. Finally, the cold and hot data were exchanged after deduplication. Results demonstrate that the duplicate recognition rate of the proposed method is 5% - 38%, which is close to that of the online deduplication method. In terms of access performance, the performance of SSDs using the proposed method is improved by 20% compared with that of traditional SSDs and is near the access performance of SSDs using online deduplication. This study provides certain reference for improving the reliability of existing SSDs

    Internet of Things-aided Smart Grid: Technologies, Architectures, Applications, Prototypes, and Future Research Directions

    Full text link
    Traditional power grids are being transformed into Smart Grids (SGs) to address the issues in existing power system due to uni-directional information flow, energy wastage, growing energy demand, reliability and security. SGs offer bi-directional energy flow between service providers and consumers, involving power generation, transmission, distribution and utilization systems. SGs employ various devices for the monitoring, analysis and control of the grid, deployed at power plants, distribution centers and in consumers' premises in a very large number. Hence, an SG requires connectivity, automation and the tracking of such devices. This is achieved with the help of Internet of Things (IoT). IoT helps SG systems to support various network functions throughout the generation, transmission, distribution and consumption of energy by incorporating IoT devices (such as sensors, actuators and smart meters), as well as by providing the connectivity, automation and tracking for such devices. In this paper, we provide a comprehensive survey on IoT-aided SG systems, which includes the existing architectures, applications and prototypes of IoT-aided SG systems. This survey also highlights the open issues, challenges and future research directions for IoT-aided SG systems

    Flash memory management with cooperation, adaptation and assistance

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    IoT for Efficient Data Collection from Real World Resources

    Get PDF
    The Internet of Things is providing new ways of experiencing and reacting to the physical world through the ability of advanced electronic devices that collect data. At the same time, as new application scenarios are envisioned, with the assistance of information generated by sensors, new problems and obstacles will arise. This requires new development to meet business and technical requirements, such as interoperability between heterogeneous devices and confidence (such as validity, security and trust) over smart devices. With the increase of these complex requirements it becomes crucial to develop an infrastructure aimed at tackling such requirements mentioned. IoT middleware โ€“ a software layer that bridges the gap between devices and information systems. Thus, this work aims to study the mechanisms and methodology for data collection, devices interoperability and data filtering, closer to the data sources, in order to optimize the collection and pre-analysis of data that can then be used by various applications such as the ones in manufacturing industry

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    TuRaN: True Random Number Generation Using Supply Voltage Underscaling in SRAMs

    Full text link
    Prior works propose SRAM-based TRNGs that extract entropy from SRAM arrays. SRAM arrays are widely used in a majority of specialized or general-purpose chips that perform the computation to store data inside the chip. Thus, SRAM-based TRNGs present a low-cost alternative to dedicated hardware TRNGs. However, existing SRAM-based TRNGs suffer from 1) low TRNG throughput, 2) high energy consumption, 3) high TRNG latency, and 4) the inability to generate true random numbers continuously, which limits the application space of SRAM-based TRNGs. Our goal in this paper is to design an SRAM-based TRNG that overcomes these four key limitations and thus, extends the application space of SRAM-based TRNGs. To this end, we propose TuRaN, a new high-throughput, energy-efficient, and low-latency SRAM-based TRNG that can sustain continuous operation. TuRaN leverages the key observation that accessing SRAM cells results in random access failures when the supply voltage is reduced below the manufacturer-recommended supply voltage. TuRaN generates random numbers at high throughput by repeatedly accessing SRAM cells with reduced supply voltage and post-processing the resulting random faults using the SHA-256 hash function. To demonstrate the feasibility of TuRaN, we conduct SPICE simulations on different process nodes and analyze the potential of access failure for use as an entropy source. We verify and support our simulation results by conducting real-world experiments on two commercial off-the-shelf FPGA boards. We evaluate the quality of the random numbers generated by TuRaN using the widely-adopted NIST standard randomness tests and observe that TuRaN passes all tests. TuRaN generates true random numbers with (i) an average (maximum) throughput of 1.6Gbps (1.812Gbps), (ii) 0.11nJ/bit energy consumption, and (iii) 278.46us latency
    • โ€ฆ
    corecore