885 research outputs found

    What broke where for distributed and parallel applications โ€” a whodunit story

    Get PDF
    Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed and parallel systems is a difficult task. These large distributed and parallel systems are composed of various complex software and hardware components. When the system experiences some performance or correctness problem, developers struggle to understand the root cause of the problem and fix in a timely manner. In my thesis, I address these three components of the performance problems in computer systems. First, we focus on diagnosing performance problems in large-scale parallel applications running on supercomputers. We developed techniques to localize the performance problem for root-cause analysis. Parallel applications, most of which are complex scientific simulations running in supercomputers, can create up to millions of parallel tasks that run on different machines and communicate using the message passing paradigm. We developed a highly scalable and accurate automated debugging tool called PRODOMETER, which uses sophisticated algorithms to first, create a logical progress dependency graph of the tasks to highlight how the problem spread through the system manifesting as a system-wide performance issue. Second, uses this logical progress dependence graph to identify the task where the problem originated. Finally, PRODOMETER pinpoints the code region corresponding to the origin of the bug. Second, we developed a tool-chain that can detect performance anomaly using machine-learning techniques and can achieve very low false positive rate. Our input-aware performance anomaly detection system consists of a scalable data collection framework to collect performance related metrics from different granularity of code regions, an offline model creation and prediction-error characterization technique, and a threshold based anomaly-detection-engine for production runs. Our system requires few training runs and can handle unknown inputs and parameter combinations by dynamically calibrating the anomaly detection threshold according to the characteristics of the input data and the characteristics of the prediction-error of the models. Third, we developed performance problem mitigation scheme for erasure-coded distributed storage systems. Repair operations of the failed blocks in erasure-coded distributed storage system take really long time in networked constrained data-centers. The reason being, during the repair operation for erasure-coded distributed storage, a lot of data from multiple nodes are gathered into a single node and then a mathematical operation is performed to reconstruct the missing part. This process severely congests the links toward the destination where newly recreated data is to be hosted. We proposed a novel distributed repair technique, called Partial-Parallel-Repair (PPR) that performs this reconstruction in parallel on multiple nodes and eliminates network bottlenecks, and as a result, greatly speeds up the repair process. Fourth, we study how for a class of applications, performance can be improved (or performance problems can be mitigated) by selectively approximating some of the computations. For many applications, the main computation happens inside a loop that can be logically divided into a few temporal segments, we call phases. We found that while approximating the initial phases might severely degrade the quality of the results, approximating the computation for the later phases have very small impact on the final quality of the result. Based on this observation, we developed an optimization framework that for a given budget of quality-loss, would find the best approximation settings for each phase in the execution

    Mobile Open Systems Technologies For The Utilities Industries

    Get PDF
    This chapter considers the provision of mobile computing support for field engineers in the electricity industry. Section 11.2 describes field engineers current working practices and from these derives a set of general requirements for a mobile computing environment to support utilities workers. A key requirement which is identified is the need for field engineers to access real-time multimedia information in the field and it is on this requirement that the remainder of the chapter focuses. Sections 11.3 and 11.4 present a survey of enabling technologies to support distributed systems operating in both local and wide area wireless environments. The impact of these technologies on the provision of mobile computing support is assessed in section 11.5. Section 11.6 describes a software architecture which attempts to address the requirements highlighted in section 11.2 and in particular is designed to support real-time access to data in the field. Finally, section 11.7 considers the degree to which utilities workers requirements can be met by the surveyed technologies and considers the likely impact of remote data access on field engineers working practices

    The Cord Weekly (September 20, 1995)

    Get PDF

    ์ด์ง„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์œ„ํ•œ DRAM ๊ธฐ๋ฐ˜์˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ฐ€์†๊ธฐ ๊ตฌ์กฐ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์œ ์Šน์ฃผ.In the convolutional neural network applications, most computations occurred by the multiplication and accumulation of the convolution and fully-connected layers. From the hardware perspective (i.e., in the gate-level circuits), these operations are performed by many dot-products between the feature map and kernel vectors. Since the feature map and kernel have the matrix form, the vector converted from 3D, or 4D matrices is reused many times for the matrix multiplications. As the throughput of the DNN increases, the power consumption and performance bottleneck due to the data movement become a more critical issue. More importantly, power consumption due to off-chip memory accesses dominates total power since off-chip memory access consumes several hundred times greater power than the computation. The accelerators' throughput is about several hundred GOPS~several TOPS, but Memory bandwidth is less than 25.6 or 34 GB/s (with DDR4 or LPDDR4). By reducing the network size and/or data movement size, both data movement power and performance bottleneck problems are improved. Among the algorithms, Quantization is widely used. Binary Neural Networks (BNNs) dramatically reduce precision down to 1 bit. The accuracy is much lower than that of the FP16, but the accuracy is continuously improving through various studies. With the data flow control, there is a method of reducing redundant data movement by increasing data reuse. The above two methods are widely applied in accelerators because they do not need additional computations in the inference computation. In this dissertation, I present 1) a DRAM-based accelerator architecture and 2) a DRAM refresh method to improve performance reduction due to DRAM refresh. Both methods are orthogonal, so can be integrated into the DRAM chip and operate independently. First, we proposed a DRAM-based accelerator architecture capable of massive and large vector dot product operation. In the field of CNN accelerators to which BNN can be applied, a computing-in-memory (CIM) structure that utilizes a cell-array structure of Memory for vector dot product operation is being actively studied. Since DRAM stores all the neural network data, it is advantageous to reduce the amount of data transfer. The proposed architecture operates by utilizing the basic operation of the DRAM. The second method is to reduce the performance degradation and power consumption caused by DRAM refresh. Since the DRAM cannot read and write data while performing a periodic refresh, system performance decreases. The proposed refresh method tests the refresh characteristics inside the DRAM chip during self-refresh and increases the refresh cycle according to the characteristics. Since it operates independently inside DRAM, it can be applied to all systems using DRAM and is the same for deep neural network accelerators. We surveyed system integration with a software stack to use the in-DRAM accelerator in the DL framework. As a result, it is expected to control in-DRAM accelerators with the memory controller implementation method verified in the previous experiment. Also, we have added the performance simulation function of in-DRAM accelerator to PyTorch. When running a neural network in PyTorch, it reports the computation latency and data movement latency occurring in the layer running in the in-DRAM accelerator. It is a significant advantage to predict the performance when running in hardware while co-designing the network.์ปจ๋ณผ๋ฃจ์…”๋„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ (CNN) ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ๋Š”, ๋Œ€๋ถ€๋ถ„์˜ ์—ฐ์‚ฐ์ด ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด์™€ ํ’€๋ฆฌ-์ปค๋„ฅํ‹ฐ๋“œ ๋ ˆ์ด์–ด์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ณฑ์…ˆ๊ณผ ๋ˆ„์  ์—ฐ์‚ฐ์ด๋‹ค. ๊ฒŒ์ดํŠธ-๋กœ์ง ๋ ˆ๋ฒจ์—์„œ๋Š”, ๋Œ€๋Ÿ‰์˜ ๋ฒกํ„ฐ ๋‚ด์ ์œผ๋กœ ์‹คํ–‰๋˜๋ฉฐ, ์ž…๋ ฅ๊ณผ ์ปค๋„ ๋ฒกํ„ฐ๋“ค์„ ๋ฐ˜๋ณตํ•ด์„œ ์‚ฌ์šฉํ•˜์—ฌ ์—ฐ์‚ฐํ•œ๋‹ค. ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ์—ฐ์‚ฐ์—๋Š” ๋ฒ”์šฉ ์—ฐ์‚ฐ ์œ ๋‹›๋ณด๋‹ค, ๋‹จ์ˆœํ•œ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•œ ์ž‘์€ ์—ฐ์‚ฐ ์œ ๋‹›์„ ๋Œ€๋Ÿ‰์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ ํ•ฉํ•˜๋‹ค. ๊ฐ€์†๊ธฐ์˜ ์„ฑ๋Šฅ์ด ์ผ์ • ์ด์ƒ ๋†’์•„์ง€๋ฉด, ๊ฐ€์†๊ธฐ์˜ ์„ฑ๋Šฅ์€ ์—ฐ์‚ฐ์— ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ์ „์†ก์— ์˜ํ•ด ์ œํ•œ๋œ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์˜คํ”„-์นฉ์œผ๋กœ ์ „์†กํ•  ๋•Œ์˜ ์—๋„ˆ์ง€ ์†Œ๋ชจ๊ฐ€, ์—ฐ์‚ฐ ์œ ๋‹›์—์„œ ์—ฐ์‚ฐ์— ์‚ฌ์šฉ๋˜๋Š” ์—๋„ˆ์ง€์˜ ์ˆ˜๋ฐฑ๋ฐฐ๋กœ ํฌ๋‹ค. ๋˜ํ•œ ์—ฐ์‚ฐ๊ธฐ์˜ ์„ฑ๋Šฅ์€ ์ดˆ๋‹น ์ˆ˜๋ฐฑ ๊ธฐ๊ฐ€~์ˆ˜ ํ…Œ๋ผ-์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ฐ์ดํ„ฐ ์ „์†ก์€ ์ดˆ๋‹น ์ˆ˜์‹ญ ๊ธฐ๊ฐ€ ๋ฐ”์ดํŠธ์ด๋‹ค. ๋ฐ์ดํ„ฐ ์ „์†ก์— ์˜ํ•œ ํŒŒ์›Œ์™€ ์„ฑ๋Šฅ ๋ฌธ์ œ๋ฅผ ๋™์‹œ์— ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์€, ์ „์†ก๋˜๋Š” ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ด๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘์—์„œ๋Š” ๋„คํŠธ์›Œํฌ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–‘์žํ™”ํ•˜์—ฌ, ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋„๋ฆฌ ์‚ฌ์šฉ๋œ๋‹ค. ์ด์ง„ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(BNN)๋Š” ์ •๋ฐ€๋„๋ฅผ 1๋น„ํŠธ๊นŒ์ง€ ๊ทน๋‹จ์ ์œผ๋กœ ๋‚ฎ์ถ˜๋‹ค. 16๋น„ํŠธ ์ •๋ฐ€๋„๋ณด๋‹ค ๋„คํŠธ์›Œํฌ์˜ ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์•„์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์ง€๋งŒ, ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ์ •ํ™•๋„๊ฐ€ ์ง€์†์ ์œผ๋กœ ๊ฐœ์„ ๋˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ ๊ตฌ์กฐ์ ์œผ๋กœ๋Š”, ์ „์†ก๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ์˜ ๋ฐ˜๋ณต์ ์ธ ์ „์†ก์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ์œ„์˜ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ ์ถ”๋ก  ๊ณผ์ •์—์„œ ๋ณ„๋„์˜ ์—ฐ์‚ฐ ์—†์ด ์ ์šฉ ๊ฐ€๋Šฅํ•˜์—ฌ ๊ฐ€์†๊ธฐ์—์„œ ๋„๋ฆฌ ์ ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š”, DRAM ๊ธฐ๋ฐ˜์˜ ๊ฐ€์†๊ธฐ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜๊ณ , DRAM refresh์— ์˜ํ•œ ์„ฑ๋Šฅ ๊ฐ์†Œ๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘ ๋ฐฉ๋ฒ•์€ ํ•˜๋‚˜์˜ DRAM ์นฉ์œผ๋กœ ์ง‘์  ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๋…๋ฆฝ์ ์œผ๋กœ ๊ตฌ๋™ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ฒซ๋ฒˆ์งธ๋Š” ๋Œ€๋Ÿ‰์˜ ๋ฒกํ„ฐ ๋‚ด์  ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•œ DRAM ๊ธฐ๋ฐ˜ ๊ฐ€์†๊ธฐ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. BNN์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” CNN๊ฐ€์†๊ธฐ ๋ถ„์•ผ์—์„œ, ๋ฉ”๋ชจ๋ฆฌ์˜ ์…€-์–ด๋ ˆ์ด ๊ตฌ์กฐ๋ฅผ ๋ฒกํ„ฐ ๋‚ด์  ์—ฐ์‚ฐ์— ํ™œ์šฉํ•˜๋Š” ์ปดํ“จํŒ…-์ธ-๋ฉ”๋ชจ๋ฆฌ(CIM) ๊ตฌ์กฐ๊ฐ€ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ํŠนํžˆ, DRAM์—๋Š” ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ฐ์ดํ„ฐ ์ „์†ก๋Ÿ‰์˜ ๊ฐ์†Œ์— ์œ ๋ฆฌํ•˜๋‹ค. ์šฐ๋ฆฌ๋Š” DRAM ์…€-์–ด๋ ˆ์ด์˜ ๊ตฌ์กฐ๋ฅผ ๋ฐ”๊พธ์ง€ ์•Š๊ณ , DRAM์˜ ๊ธฐ๋ณธ ๋™์ž‘์„ ํ™œ์šฉํ•˜์—ฌ ์—ฐ์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋Š” DRAM ๋ฆฌํ”„๋ ˆ์‰ฌ ์ฃผ๊ธฐ๋ฅผ ๋Š˜๋ ค์„œ ์„ฑ๋Šฅ ์—ดํ™”์™€ ํŒŒ์›Œ ์†Œ๋ชจ๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. DRAM์ด ๋ฆฌํ”„๋ ˆ์‰ฌ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ๋งˆ๋‹ค, ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์“ธ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์‹œ์Šคํ…œ ํ˜น์€ ๊ฐ€์†๊ธฐ์˜ ์„ฑ๋Šฅ ๊ฐ์†Œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. DRAM ์นฉ ๋‚ด๋ถ€์—์„œ DRAM์˜ ๋ฆฌํ”„๋ ˆ์‰ฌ ํŠน์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๊ณ , ๋ฆฌํ”„๋ ˆ์‰ฌ ์ฃผ๊ธฐ๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. DRAM ๋‚ด๋ถ€์—์„œ ๋…๋ฆฝ์ ์œผ๋กœ ๋™์ž‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์— DRAM์„ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋“  ์‹œ์Šคํ…œ์— ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ฐ€์†๊ธฐ์—์„œ๋„ ๋™์ผํ•˜๋‹ค. ๋˜ํ•œ, ์ œ์•ˆ๋œ ๊ฐ€์†๊ธฐ๋ฅผ PyTorch์™€ ๊ฐ™์ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„ ์›Œํฌ์—์„œ๋„ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก, ์†Œํ”„ํŠธ์›จ์–ด ์Šคํƒ์„ ๋น„๋กฏํ•œ system integration ๋ฐฉ๋ฒ•์„ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ๊ธฐ์กด์˜ TVM compiler์™€ FPGA๋กœ ๊ตฌํ˜„ํ•˜๋Š” TVM/VTA ๊ฐ€์†๊ธฐ์—, DRAM refresh ์‹คํ—˜์—์„œ ๊ฒ€์ฆ๋œ ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ์™€ ์ปค์Šคํ…€ ์ปดํŒŒ์ผ๋Ÿฌ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด in-DRAM ๊ฐ€์†๊ธฐ๋ฅผ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค. ์ด์— ๋”ํ•˜์—ฌ, in-DRAM ๊ฐ€์†๊ธฐ์™€ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ์„ค๊ณ„ ๋‹จ๊ณ„์—์„œ ์„ฑ๋Šฅ์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋„๋ก, ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋Šฅ์„ PyTorch์— ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. PyTorch์—์„œ ์‹ ๊ฒฝ๋ง์„ ์‹คํ–‰ํ•  ๋•Œ, DRAM ๊ฐ€์†๊ธฐ์—์„œ ์‹คํ–‰๋˜๋Š” ๊ณ„์ธต์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ณ„์‚ฐ ๋Œ€๊ธฐ ์‹œ๊ฐ„ ๋ฐ ๋ฐ์ดํ„ฐ ์ด๋™ ์‹œ๊ฐ„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.Abstract i Contents viii List of Tables x List of Figures xiv Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 Neural Network Operation . . . . . . . . . . . . . . . . 6 2.2 Data Movement Overhead . . . . . . . . . . . . . . . . 7 2.3 Binary Neural Networks . . . . . . . . . . . . . . . . . 10 2.4 Computing-in-Memory . . . . . . . . . . . . . . . . . . 11 2.5 Memory Bottleneck due to Refresh . . . . . . . . . . . . 13 Chapter 3 In-DRAM Neural Network Accelerator 16 3.1 Backgrounds . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 DRAM hierarchy . . . . . . . . . . . . . . . . . 18 3.1.2 DRAM Basic Operation . . . . . . . . . . . . . 21 3.1.3 DRAM Commands with Timing Parameters . . . 22 3.1.4 Bit-wise Operation in DRAM . . . . . . . . . . 25 3.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Proposed architecture . . . . . . . . . . . . . . . . . . . 30 3.3.1 Operation Examples of Row Operator . . . . . . 32 3.3.2 Convolutions on DRAM Chip . . . . . . . . . . 39 3.4 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.1 Input Broadcasting in DRAM . . . . . . . . . . 44 3.4.2 Input Data Movement With M2V . . . . . . . . . 47 3.4.3 Internal Data Movement With SiD . . . . . . . . 49 3.4.4 Data Partitioning for Parallel Operation . . . . . 52 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5.1 Performance Estimation . . . . . . . . . . . . . 56 3.5.2 Configuration of In-DRAM Accelerator . . . . . 58 3.5.3 Improving the Accuracy of BNN . . . . . . . . . 60 3.5.4 Comparison with the Existing Works . . . . . . . 62 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.6.1 Performance Comparison with ASIC Accelerators 67 3.6.2 Challenges of The Proposed Architecture . . . . 70 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter 4 Reducing DRAM Refresh Power Consumption by Runtime Profiling of Retention Time and Dualrow Activation 74 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Observations . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 Solution overview . . . . . . . . . . . . . . . . . . . . . 88 4.6 Runtime profiling . . . . . . . . . . . . . . . . . . . . . 93 4.6.1 Basic Operation . . . . . . . . . . . . . . . . . . 93 4.6.2 Profiling Multiple Rows in Parallel . . . . . . . . 96 4.6.3 Temperature, Data Backup and Error Check . . . 96 4.7 Dual-row Activation . . . . . . . . . . . . . . . . . . . . 98 4.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 102 4.8.1 Experimental Setup . . . . . . . . . . . . . . . . 103 4.8.2 Refresh Period Improvement . . . . . . . . . . . 107 4.8.3 Power Reduction . . . . . . . . . . . . . . . . . 110 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 116 Chapter 5 System Integration 118 5.1 Integrate The Proposed Methods . . . . . . . . . . . . . 118 5.2 Software Stack . . . . . . . . . . . . . . . . . . . . . . 121 Chapter 6 Conclusion 129 Bibliography 131 ๊ตญ๋ฌธ์ดˆ๋ก 153Docto
    • โ€ฆ
    corecore