218 research outputs found

    March CRF: an Efficient Test for Complex Read Faults in SRAM Memories

    No full text
    In this paper we study Complex Read Faults in SRAMs, a combination of various malfunctions that affect the read operation in nanoscale memories. All the memory elements involved in the read operation are studied, underlining the causes of the realistic faults concerning this operation. The requirements to cover these fault models are given. We show that the different causes of read failure are independent and may coexist in nanoscale SRAMs, summing their effects and provoking Complex Read Faults, CRFs. We show that the test methodology to cover this new read faults consists in test patterns that match the requirements to cover all the different simple read fault models. We propose a low complexity (?2N) test, March CRF, that covers effectively all the realistic Complex Read Fault

    On the use of error detecting and correcting codes to boost security in caches against side channel attacks

    Get PDF
    Microprocessor memory is sensitive to cold boot attacks. In this kind of attacks, memory remanence is exploited to download its content after the microprocessor has been struck by a hard boot. If just in this moment, a crypto-algorithm was in execution, the memory data can be downloaded into a backup memory and specialized tools can be used to extract the secret keys. In the main memory data can be protected using efficient encryption techniques but in caches this is not possible unless the performance becomes seriously degraded. Recently, an interleaved scrambling technique (IST) was presented to improve the security of caches against cold boot attacks. While IST is effective for this particular kind of attacks, a weakness exists against side channel attacks, in particular using power analysis. Reliability of data in caches is warranted by means of error detecting and correcting codes. In this work it is shown how these kinds of codes can be used not only to improve reliability but also the security of data. In particular, a self-healing technique is selected to make the IST technique robust against side channel attacks using power analysis.Postprint (authorโ€™s final draft

    Design techniques for dense embedded memory in advanced CMOS technologies

    Get PDF
    University of Minnesota Ph.D. dissertation. February 2012. Major: Electrical Engineering. Advisor: Chris H. Kim. 1 computer file (PDF); viii, 116 pages.On-die cache memory is a key component in advanced processors since it can boost micro-architectural level performance at a moderate power penalty. Demand for denser memories only going to increase as the number of cores in a microprocessor goes up with technology scaling. A commensurate increase in the amount of cache memory is needed to fully utilize the larger and more powerful processing units. 6T SRAMs have been the embedded memory of choice for modern microprocessors due to their logic compatibility, high speed, and refresh-free operation. However, the relatively large cell size and conflicting requirements for read and write make aggressive scaling of 6T SRAMs challenging in sub-22 nm. In this dissertation, circuit techniques and simulation methodologies are presented to demonstrate the potential of alternative options such as gain cell eDRAMs and spin-torque-transfer magnetic RAMs (STT-MRAMs) for high density embedded memories.Three unique test chip designs are presented to enhance the retention time and access speed of gain cell eDRAMs. Proposed bit-cells utilize preferential boostings, beneficial couplings, and aggregated cell leakages for expanding signal window between data `1' and `0'. The design space of power-delay product can be further enhanced with various assist schemes that harness the innate properties of gain cell eDRAMs. Experimental results from the test chips demonstrate that the proposed gain cell eDRAMs achieve overall faster system performances and lower static power dissipations than SRAMs in a generic 65 nm low-power (LP) CMOS process. A magnetic tunnel junction (MTJ) scaling scenario and an efficient HSPICE simulation methodology are proposed for exploring the scalability of STT-MRAMs under variation effects from 65 nm to 8 nm. A constant JC0*RA/VDD scaling method is adopted to achieve optimal read and write performances of STT-MRAMs and thermal stabilities for a 10 year retention are achieved by adjusting free layer thicknesses as well as projecting crystalline anisotropy improvements. Studies based on the proposed methodology show that in-plane STT-MRAM will outperform SRAM from 15 nm node, while its perpendicular counterpart requires further innovations in MTJ material properties in order to overcome the poor write performance from 22 nm node

    WORKLOAD-ADAPTATION IN MEMORY CONTROLLERS

    Get PDF

    ์ด์ง„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์œ„ํ•œ DRAM ๊ธฐ๋ฐ˜์˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ฐ€์†๊ธฐ ๊ตฌ์กฐ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์œ ์Šน์ฃผ.In the convolutional neural network applications, most computations occurred by the multiplication and accumulation of the convolution and fully-connected layers. From the hardware perspective (i.e., in the gate-level circuits), these operations are performed by many dot-products between the feature map and kernel vectors. Since the feature map and kernel have the matrix form, the vector converted from 3D, or 4D matrices is reused many times for the matrix multiplications. As the throughput of the DNN increases, the power consumption and performance bottleneck due to the data movement become a more critical issue. More importantly, power consumption due to off-chip memory accesses dominates total power since off-chip memory access consumes several hundred times greater power than the computation. The accelerators' throughput is about several hundred GOPS~several TOPS, but Memory bandwidth is less than 25.6 or 34 GB/s (with DDR4 or LPDDR4). By reducing the network size and/or data movement size, both data movement power and performance bottleneck problems are improved. Among the algorithms, Quantization is widely used. Binary Neural Networks (BNNs) dramatically reduce precision down to 1 bit. The accuracy is much lower than that of the FP16, but the accuracy is continuously improving through various studies. With the data flow control, there is a method of reducing redundant data movement by increasing data reuse. The above two methods are widely applied in accelerators because they do not need additional computations in the inference computation. In this dissertation, I present 1) a DRAM-based accelerator architecture and 2) a DRAM refresh method to improve performance reduction due to DRAM refresh. Both methods are orthogonal, so can be integrated into the DRAM chip and operate independently. First, we proposed a DRAM-based accelerator architecture capable of massive and large vector dot product operation. In the field of CNN accelerators to which BNN can be applied, a computing-in-memory (CIM) structure that utilizes a cell-array structure of Memory for vector dot product operation is being actively studied. Since DRAM stores all the neural network data, it is advantageous to reduce the amount of data transfer. The proposed architecture operates by utilizing the basic operation of the DRAM. The second method is to reduce the performance degradation and power consumption caused by DRAM refresh. Since the DRAM cannot read and write data while performing a periodic refresh, system performance decreases. The proposed refresh method tests the refresh characteristics inside the DRAM chip during self-refresh and increases the refresh cycle according to the characteristics. Since it operates independently inside DRAM, it can be applied to all systems using DRAM and is the same for deep neural network accelerators. We surveyed system integration with a software stack to use the in-DRAM accelerator in the DL framework. As a result, it is expected to control in-DRAM accelerators with the memory controller implementation method verified in the previous experiment. Also, we have added the performance simulation function of in-DRAM accelerator to PyTorch. When running a neural network in PyTorch, it reports the computation latency and data movement latency occurring in the layer running in the in-DRAM accelerator. It is a significant advantage to predict the performance when running in hardware while co-designing the network.์ปจ๋ณผ๋ฃจ์…”๋„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ (CNN) ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ๋Š”, ๋Œ€๋ถ€๋ถ„์˜ ์—ฐ์‚ฐ์ด ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด์™€ ํ’€๋ฆฌ-์ปค๋„ฅํ‹ฐ๋“œ ๋ ˆ์ด์–ด์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ณฑ์…ˆ๊ณผ ๋ˆ„์  ์—ฐ์‚ฐ์ด๋‹ค. ๊ฒŒ์ดํŠธ-๋กœ์ง ๋ ˆ๋ฒจ์—์„œ๋Š”, ๋Œ€๋Ÿ‰์˜ ๋ฒกํ„ฐ ๋‚ด์ ์œผ๋กœ ์‹คํ–‰๋˜๋ฉฐ, ์ž…๋ ฅ๊ณผ ์ปค๋„ ๋ฒกํ„ฐ๋“ค์„ ๋ฐ˜๋ณตํ•ด์„œ ์‚ฌ์šฉํ•˜์—ฌ ์—ฐ์‚ฐํ•œ๋‹ค. ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ์—ฐ์‚ฐ์—๋Š” ๋ฒ”์šฉ ์—ฐ์‚ฐ ์œ ๋‹›๋ณด๋‹ค, ๋‹จ์ˆœํ•œ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•œ ์ž‘์€ ์—ฐ์‚ฐ ์œ ๋‹›์„ ๋Œ€๋Ÿ‰์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ ํ•ฉํ•˜๋‹ค. ๊ฐ€์†๊ธฐ์˜ ์„ฑ๋Šฅ์ด ์ผ์ • ์ด์ƒ ๋†’์•„์ง€๋ฉด, ๊ฐ€์†๊ธฐ์˜ ์„ฑ๋Šฅ์€ ์—ฐ์‚ฐ์— ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ์ „์†ก์— ์˜ํ•ด ์ œํ•œ๋œ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์˜คํ”„-์นฉ์œผ๋กœ ์ „์†กํ•  ๋•Œ์˜ ์—๋„ˆ์ง€ ์†Œ๋ชจ๊ฐ€, ์—ฐ์‚ฐ ์œ ๋‹›์—์„œ ์—ฐ์‚ฐ์— ์‚ฌ์šฉ๋˜๋Š” ์—๋„ˆ์ง€์˜ ์ˆ˜๋ฐฑ๋ฐฐ๋กœ ํฌ๋‹ค. ๋˜ํ•œ ์—ฐ์‚ฐ๊ธฐ์˜ ์„ฑ๋Šฅ์€ ์ดˆ๋‹น ์ˆ˜๋ฐฑ ๊ธฐ๊ฐ€~์ˆ˜ ํ…Œ๋ผ-์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ฐ์ดํ„ฐ ์ „์†ก์€ ์ดˆ๋‹น ์ˆ˜์‹ญ ๊ธฐ๊ฐ€ ๋ฐ”์ดํŠธ์ด๋‹ค. ๋ฐ์ดํ„ฐ ์ „์†ก์— ์˜ํ•œ ํŒŒ์›Œ์™€ ์„ฑ๋Šฅ ๋ฌธ์ œ๋ฅผ ๋™์‹œ์— ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์€, ์ „์†ก๋˜๋Š” ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ด๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘์—์„œ๋Š” ๋„คํŠธ์›Œํฌ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–‘์žํ™”ํ•˜์—ฌ, ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋„๋ฆฌ ์‚ฌ์šฉ๋œ๋‹ค. ์ด์ง„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(BNN)๋Š” ์ •๋ฐ€๋„๋ฅผ 1๋น„ํŠธ๊นŒ์ง€ ๊ทน๋‹จ์ ์œผ๋กœ ๋‚ฎ์ถ˜๋‹ค. 16๋น„ํŠธ ์ •๋ฐ€๋„๋ณด๋‹ค ๋„คํŠธ์›Œํฌ์˜ ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์•„์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์ง€๋งŒ, ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ์ •ํ™•๋„๊ฐ€ ์ง€์†์ ์œผ๋กœ ๊ฐœ์„ ๋˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ ๊ตฌ์กฐ์ ์œผ๋กœ๋Š”, ์ „์†ก๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ์˜ ๋ฐ˜๋ณต์ ์ธ ์ „์†ก์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ์œ„์˜ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ์ถ”๋ก  ๊ณผ์ •์—์„œ ๋ณ„๋„์˜ ์—ฐ์‚ฐ ์—†์ด ์ ์šฉ ๊ฐ€๋Šฅํ•˜์—ฌ ๊ฐ€์†๊ธฐ์—์„œ ๋„๋ฆฌ ์ ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š”, DRAM ๊ธฐ๋ฐ˜์˜ ๊ฐ€์†๊ธฐ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜๊ณ , DRAM refresh์— ์˜ํ•œ ์„ฑ๋Šฅ ๊ฐ์†Œ๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘ ๋ฐฉ๋ฒ•์€ ํ•˜๋‚˜์˜ DRAM ์นฉ์œผ๋กœ ์ง‘์  ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๋…๋ฆฝ์ ์œผ๋กœ ๊ตฌ๋™ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ฒซ๋ฒˆ์งธ๋Š” ๋Œ€๋Ÿ‰์˜ ๋ฒกํ„ฐ ๋‚ด์  ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•œ DRAM ๊ธฐ๋ฐ˜ ๊ฐ€์†๊ธฐ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. BNN์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” CNN๊ฐ€์†๊ธฐ ๋ถ„์•ผ์—์„œ, ๋ฉ”๋ชจ๋ฆฌ์˜ ์…€-์–ด๋ ˆ์ด ๊ตฌ์กฐ๋ฅผ ๋ฒกํ„ฐ ๋‚ด์  ์—ฐ์‚ฐ์— ํ™œ์šฉํ•˜๋Š” ์ปดํ“จํŒ…-์ธ-๋ฉ”๋ชจ๋ฆฌ(CIM) ๊ตฌ์กฐ๊ฐ€ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ํŠนํžˆ, DRAM์—๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ฐ์ดํ„ฐ ์ „์†ก๋Ÿ‰์˜ ๊ฐ์†Œ์— ์œ ๋ฆฌํ•˜๋‹ค. ์šฐ๋ฆฌ๋Š” DRAM ์…€-์–ด๋ ˆ์ด์˜ ๊ตฌ์กฐ๋ฅผ ๋ฐ”๊พธ์ง€ ์•Š๊ณ , DRAM์˜ ๊ธฐ๋ณธ ๋™์ž‘์„ ํ™œ์šฉํ•˜์—ฌ ์—ฐ์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋Š” DRAM ๋ฆฌํ”„๋ ˆ์‰ฌ ์ฃผ๊ธฐ๋ฅผ ๋Š˜๋ ค์„œ ์„ฑ๋Šฅ ์—ดํ™”์™€ ํŒŒ์›Œ ์†Œ๋ชจ๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. DRAM์ด ๋ฆฌํ”„๋ ˆ์‰ฌ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ๋งˆ๋‹ค, ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์“ธ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์‹œ์Šคํ…œ ํ˜น์€ ๊ฐ€์†๊ธฐ์˜ ์„ฑ๋Šฅ ๊ฐ์†Œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. DRAM ์นฉ ๋‚ด๋ถ€์—์„œ DRAM์˜ ๋ฆฌํ”„๋ ˆ์‰ฌ ํŠน์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๊ณ , ๋ฆฌํ”„๋ ˆ์‰ฌ ์ฃผ๊ธฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. DRAM ๋‚ด๋ถ€์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ ๋™์ž‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์— DRAM์„ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋“  ์‹œ์Šคํ…œ์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ฐ€์†๊ธฐ์—์„œ๋„ ๋™์ผํ•˜๋‹ค. ๋˜ํ•œ, ์ œ์•ˆ๋œ ๊ฐ€์†๊ธฐ๋ฅผ PyTorch์™€ ๊ฐ™์ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„ ์›Œํฌ์—์„œ๋„ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก, ์†Œํ”„ํŠธ์›จ์–ด ์Šคํƒ์„ ๋น„๋กฏํ•œ system integration ๋ฐฉ๋ฒ•์„ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ๊ธฐ์กด์˜ TVM compiler์™€ FPGA๋กœ ๊ตฌํ˜„ํ•˜๋Š” TVM/VTA ๊ฐ€์†๊ธฐ์—, DRAM refresh ์‹คํ—˜์—์„œ ๊ฒ€์ฆ๋œ ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ์™€ ์ปค์Šคํ…€ ์ปดํŒŒ์ผ๋Ÿฌ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด in-DRAM ๊ฐ€์†๊ธฐ๋ฅผ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค. ์ด์— ๋”ํ•˜์—ฌ, in-DRAM ๊ฐ€์†๊ธฐ์™€ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ์„ค๊ณ„ ๋‹จ๊ณ„์—์„œ ์„ฑ๋Šฅ์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋„๋ก, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋Šฅ์„ PyTorch์— ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. PyTorch์—์„œ ์‹ ๊ฒฝ๋ง์„ ์‹คํ–‰ํ•  ๋•Œ, DRAM ๊ฐ€์†๊ธฐ์—์„œ ์‹คํ–‰๋˜๋Š” ๊ณ„์ธต์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ณ„์‚ฐ ๋Œ€๊ธฐ ์‹œ๊ฐ„ ๋ฐ ๋ฐ์ดํ„ฐ ์ด๋™ ์‹œ๊ฐ„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.Abstract i Contents viii List of Tables x List of Figures xiv Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 Neural Network Operation . . . . . . . . . . . . . . . . 6 2.2 Data Movement Overhead . . . . . . . . . . . . . . . . 7 2.3 Binary Neural Networks . . . . . . . . . . . . . . . . . 10 2.4 Computing-in-Memory . . . . . . . . . . . . . . . . . . 11 2.5 Memory Bottleneck due to Refresh . . . . . . . . . . . . 13 Chapter 3 In-DRAM Neural Network Accelerator 16 3.1 Backgrounds . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 DRAM hierarchy . . . . . . . . . . . . . . . . . 18 3.1.2 DRAM Basic Operation . . . . . . . . . . . . . 21 3.1.3 DRAM Commands with Timing Parameters . . . 22 3.1.4 Bit-wise Operation in DRAM . . . . . . . . . . 25 3.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Proposed architecture . . . . . . . . . . . . . . . . . . . 30 3.3.1 Operation Examples of Row Operator . . . . . . 32 3.3.2 Convolutions on DRAM Chip . . . . . . . . . . 39 3.4 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.1 Input Broadcasting in DRAM . . . . . . . . . . 44 3.4.2 Input Data Movement With M2V . . . . . . . . . 47 3.4.3 Internal Data Movement With SiD . . . . . . . . 49 3.4.4 Data Partitioning for Parallel Operation . . . . . 52 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5.1 Performance Estimation . . . . . . . . . . . . . 56 3.5.2 Configuration of In-DRAM Accelerator . . . . . 58 3.5.3 Improving the Accuracy of BNN . . . . . . . . . 60 3.5.4 Comparison with the Existing Works . . . . . . . 62 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.6.1 Performance Comparison with ASIC Accelerators 67 3.6.2 Challenges of The Proposed Architecture . . . . 70 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter 4 Reducing DRAM Refresh Power Consumption by Runtime Profiling of Retention Time and Dualrow Activation 74 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Observations . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 Solution overview . . . . . . . . . . . . . . . . . . . . . 88 4.6 Runtime profiling . . . . . . . . . . . . . . . . . . . . . 93 4.6.1 Basic Operation . . . . . . . . . . . . . . . . . . 93 4.6.2 Profiling Multiple Rows in Parallel . . . . . . . . 96 4.6.3 Temperature, Data Backup and Error Check . . . 96 4.7 Dual-row Activation . . . . . . . . . . . . . . . . . . . . 98 4.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 102 4.8.1 Experimental Setup . . . . . . . . . . . . . . . . 103 4.8.2 Refresh Period Improvement . . . . . . . . . . . 107 4.8.3 Power Reduction . . . . . . . . . . . . . . . . . 110 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 116 Chapter 5 System Integration 118 5.1 Integrate The Proposed Methods . . . . . . . . . . . . . 118 5.2 Software Stack . . . . . . . . . . . . . . . . . . . . . . 121 Chapter 6 Conclusion 129 Bibliography 131 ๊ตญ๋ฌธ์ดˆ๋ก 153Docto
    • โ€ฆ
    corecore