424 research outputs found

    낸드 플래시 메모리 신뢰도 향상을 위한 신호 처리 방법 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 성원용.The capacity of NAND flash memory has been continuously increased by aggressive technology scaling and multi-level cell (MLC) data coding. However, it becomes more challenging to maintain the current growth rate of the memory density mainly because of degraded signal quality of sub-20 nm NAND flash memory. This dissertation develops signal processing techniques to improve the signal reliability of MLC NAND flash memory. In the first part of this dissertation, we develop two threshold voltage distribution estimation algorithms to compensate the effect of program-erase (PE) cycling and charge loss in MLC NAND flash memory. The sensing directed estimation (SDE) utilizes the output of multi-level memory sensing to estimate the means and the variances of the threshold voltage distribution that is modeled as a Gaussian mixture. In order to reduce memory sensing overheads for the SDE algorithm, we develop a decision directed estimation (DDE) that uses error corrected bit patterns for more frequent updates of the model parameters. We also present a combined estimation scheme that employs both the SDE and the DDE approaches to minimize the number of memory sensing operations while maintaining the estimation accuracy. The effectiveness of the SDE and the DDE algorithms is evaluated by using both simulated and real NAND flash memory, and it is demonstrated that the proposed algorithms can estimate the statistical information of threshold voltage distribution accurately. The cell-to-cell interference (CCI) is one of the major sources of bit errors in sub-20 nm NAND flash memory and becomes more severe as the size of memory cell decreases. In the second part of this dissertation, we develop a CCI cancellation algorithm that is similar to interference cancellers employed in conventional communication systems. We first provide the experimental characterization of the CCI by measuring the coupling coefficients from actual NAND flash memory with a 26 nm process technology. Then, we present a CCI cancellation algorithm that consists of the coupling coefficient estimation and the CCI removal steps. To reduce the number of memory sensing operations, the optimal quantization schemes for the proposed CCI canceller are also studied. This dissertation also develops soft-information computation schemes in order to apply soft-decision error correction to NAND flash memory. The probability density function (PDF) of the CCI removed signal is quite different from that of the original threshold voltage, which can be modeled as a Gaussian mixture. Thus, computing soft-information, such as LLR (log likelihood ratio), with the CCI removed signal is not straightforward. We propose two soft-information computation schemes that combine CCI cancellation and soft-decision error correction. In the first approach, we derive a mathematical formulation for the PDF of the CCI removed signal and directly compute the LLR values by using it. In the second approach, CCI cancellation and soft-information computation are jointly conducted. Based on the intensive simulations, it is demonstrated that the reliability of NAND flash memory is significantly improved by applying the proposed signal processing algorithms as well as soft-decision error correction.1 Introduction 14 2 NAND Flash Memory Basics 19 2.1 Basics of NAND Flash Memory 19 2.1.1 NAND Flash Memory Structure 19 2.1.2 Multi-Page Programming 20 2.1.3 Cell-to-Cell Interference 22 2.1.4 Data Retention 23 2.2 Threshold Voltage Distribution of NAND Flash Memory and Signal Modeling 25 2.2.1 Threshold Voltage Distribution and Gaussian Approximation 25 2.2.2 Modeling of Threshold Voltage Signal 27 3 Threshold Voltage Distribution Estimation 31 3.1 Introduction 31 3.2 Sensing Directed Estimation of Threshold Voltage Distribution 33 3.2.1 Cost Function 34 3.2.2 Gradient Descent Method based Parameter Search 36 3.2.3 Levenberg-Marquardt Method based Parameter Search 38 3.2.4 Experimental Results 41 3.3 Decision Directed Estimation of Threshold Voltage Distribution 50 3.3.1 Basic Idea 51 3.3.2 Applying to Two-Bit MLC NAND Flash Memory 54 3.3.3 Combined Threshold Voltage Distribution Estimation 57 3.3.4 Error Analysis 58 3.3.5 Experimental Results 64 3.4 Concluding Remarks 70 4 Cell-to-Cell Interference Cancellation 71 4.1 Introduction 71 4.2 Direct Measurement of Coupling Coefficients 73 4.2.1 Measurement Procedure 74 4.2.2 Experimental Results 77 4.3 Least Squares Method based Coupling Coefficient Estimation 83 4.4 Multi-Level Memory Sensing Schemes for CCI Cancellation 87 4.5 Experimental Results 91 4.5.1 CCI Cancellation with Simulated NAND Flash Memory 91 4.5.2 CCI Cancellation with Real NAND Flash Memory 96 4.6 Concluding Remarks 97 5 Soft-Decision Error Correction in NAND Flash Memory 99 5.1 Introduction 99 5.2 Soft-Decision Error Correction without CCI Cancellation 101 5.3 Soft-Decision Error Correction with CCI Cancellation 104 5.3.1 Soft-Information Computation using PDF of CCI Removed Signal 104 5.3.2 Joint CCI Cancellation and Soft-Information Computation 110 5.3.3 Experimental Results 115 5.4 Concluding Remarks 118 6 Conclusion 120Docto

    Flash Memory Devices

    Get PDF
    Flash memory devices have represented a breakthrough in storage since their inception in the mid-1980s, and innovation is still ongoing. The peculiarity of such technology is an inherent flexibility in terms of performance and integration density according to the architecture devised for integration. The NOR Flash technology is still the workhorse of many code storage applications in the embedded world, ranging from microcontrollers for automotive environment to IoT smart devices. Their usage is also forecasted to be fundamental in emerging AI edge scenario. On the contrary, when massive data storage is required, NAND Flash memories are necessary to have in a system. You can find NAND Flash in USB sticks, cards, but most of all in Solid-State Drives (SSDs). Since SSDs are extremely demanding in terms of storage capacity, they fueled a new wave of innovation, namely the 3D architecture. Today “3D” means that multiple layers of memory cells are manufactured within the same piece of silicon, easily reaching a terabit capacity. So far, Flash architectures have always been based on "floating gate," where the information is stored by injecting electrons in a piece of polysilicon surrounded by oxide. On the contrary, emerging concepts are based on "charge trap" cells. In summary, flash memory devices represent the largest landscape of storage devices, and we expect more advancements in the coming years. This will require a lot of innovation in process technology, materials, circuit design, flash management algorithms, Error Correction Code and, finally, system co-design for new applications such as AI and security enforcement

    A (Nearly) Free Lunch:Extending NAND Flash Lifetime by Exploiting Neglected Physical Properties

    Get PDF
    NAND flash is a key storage technology in modern computing systems. Without it, many devices would probably not exist today or would at least not benefit from as many features. The very large success of this technology motivated massive efforts to scale it down in order to increase its density further. However, NAND flash is currently facing physical limitations that prevent it reaching smaller cell sizes without severely reducing its storage reliability and lifetime. Accordingly, in the present thesis we aim at relieving some constraints from device manufacturing by addressing flash irregularities at a higher level. For example, we acknowledge the fact that process variation plus other factors render some regions of a flash device more sensitive than others. This difference usually leads to sensitive regions exhausting their lifetime early, which then causes the device to become unusable, while the rest of the device is still healthy, yet not exploitable. Consequently, we propose to postpone this exhaustion point with new strategies that require minimal resources to be implemented and effectively extend flash devices lifetime. Sometimes, our strategies involve unconventional methods to access the flash that are not supported by specification document and, therefore, should not be used lightly. Hence, we also present thorough characterization experiments on actual NAND flash chips to validate these methods and model their effect on a flash device. Finally, we evaluate the performance of our methods by implementing a trace-driven flash device simulator and execute a large set of realistic disk traces. Overall, we exploit properties that are either neglected or not understood to propose methods that are nearly free to implement and systematically extend NAND flash lifetime. We are convinced that future NAND flash architectures will regularly bring radical physical changes, which will inevitably come together with a new set of physical properties to investigate and to exploit

    Digital Circuit Design Using Floating Gate Transistors

    Get PDF
    Floating gate (flash) transistors are used exclusively for memory applications today. These applications include SD cards of various form factors, USB flash drives and SSDs. In this thesis, we explore the use of flash transistors to implement digital logic circuits. Since the threshold voltage of flash transistors can be modified at a fine granularity during programming, several advantages are obtained by our flash-based digital circuit design approach. For one, speed binning at the factory can be controlled with precision. Secondly, an IC can be re-programmed in the field, to negate effects such as aging, which has been a significant problem in recent times, particularly for mission-critical applications. Thirdly, unlike a regular MOSFET, which has one threshold voltage level, a flash transistor can have multiple threshold voltage levels. The benefit of having multiple threshold voltage levels in a flash transistor is that it allows the ability to encode more symbols in each device, unlike a regular MOSFET. This allows us to implement multi-valued logic functions natively. In this thesis, we evaluate different flash-based digital circuit design approaches and compare their performance with a traditional CMOS standard cell-based design approach. We begin by evaluating our design approach at the cell level to optimize the design’s delay, power energy and physical area characteristics. The flash-based approach is demonstrated to be better than the CMOS standard cell approach, for these performance metrics. Afterwards, we present the performance of our design approach at the block level. We describe a synthesis flow to decompose a circuit block into a network of interconnected flash-based circuit cells. We also describe techniques to optimize the resulting network of flash-based circuit cells using don’t cares. Our optimization approach distinguishes itself from other optimization techniques that use don’t cares, since it a) targets a flash-based design flow, b) optimizes clusters of logic nodes at once instead of one node at a time, c) attempts to reduce the number of cubes instead of reducing the number of literals in each cube and d) performs optimization on the post-technology mapped netlist which results in a direct improvement in result quality, as compared to pre-technology mapping logic optimization that is typically done in the literature. The resulting network characteristics (delay, power, energy and physical area) are presented. These results are compared with a standard cell-based realization of the same block (obtained using commercial tools) and we demonstrate significant improvements in all the design metrics. We also study flash-based FPGA designs (both static and dynamic), and present the tradeoff of delay, power dissipation and energy consumption of the various designs. Our work differs from previously proposed flash-based FPGAs, since we embed the flash transistors (which store the configuration bits) directly within the logic and interconnect fabrics. We also present a detailed description of how the programming of the configuration bits is accomplished, for all the proposed designs

    Digital Circuit Design Using Floating Gate Transistors

    Get PDF
    Floating gate (flash) transistors are used exclusively for memory applications today. These applications include SD cards of various form factors, USB flash drives and SSDs. In this thesis, we explore the use of flash transistors to implement digital logic circuits. Since the threshold voltage of flash transistors can be modified at a fine granularity during programming, several advantages are obtained by our flash-based digital circuit design approach. For one, speed binning at the factory can be controlled with precision. Secondly, an IC can be re-programmed in the field, to negate effects such as aging, which has been a significant problem in recent times, particularly for mission-critical applications. Thirdly, unlike a regular MOSFET, which has one threshold voltage level, a flash transistor can have multiple threshold voltage levels. The benefit of having multiple threshold voltage levels in a flash transistor is that it allows the ability to encode more symbols in each device, unlike a regular MOSFET. This allows us to implement multi-valued logic functions natively. In this thesis, we evaluate different flash-based digital circuit design approaches and compare their performance with a traditional CMOS standard cell-based design approach. We begin by evaluating our design approach at the cell level to optimize the design’s delay, power energy and physical area characteristics. The flash-based approach is demonstrated to be better than the CMOS standard cell approach, for these performance metrics. Afterwards, we present the performance of our design approach at the block level. We describe a synthesis flow to decompose a circuit block into a network of interconnected flash-based circuit cells. We also describe techniques to optimize the resulting network of flash-based circuit cells using don’t cares. Our optimization approach distinguishes itself from other optimization techniques that use don’t cares, since it a) targets a flash-based design flow, b) optimizes clusters of logic nodes at once instead of one node at a time, c) attempts to reduce the number of cubes instead of reducing the number of literals in each cube and d) performs optimization on the post-technology mapped netlist which results in a direct improvement in result quality, as compared to pre-technology mapping logic optimization that is typically done in the literature. The resulting network characteristics (delay, power, energy and physical area) are presented. These results are compared with a standard cell-based realization of the same block (obtained using commercial tools) and we demonstrate significant improvements in all the design metrics. We also study flash-based FPGA designs (both static and dynamic), and present the tradeoff of delay, power dissipation and energy consumption of the various designs. Our work differs from previously proposed flash-based FPGAs, since we embed the flash transistors (which store the configuration bits) directly within the logic and interconnect fabrics. We also present a detailed description of how the programming of the configuration bits is accomplished, for all the proposed designs

    Special Topics in Information Technology

    Get PDF
    This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the thirteen best theses defended in 2020-21 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system
    corecore