381 research outputs found

    Improving Phase Change Memory Performance with Data Content Aware Access

    Full text link
    A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in workloads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state-of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly improves system performance and memory system energy consumption compared to the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium on Memory Management (ISMM

    Model-Based Performance Prediction for Concurrent Software on Multicore Architectures

    Get PDF
    Model-based performance prediction is a well-known concept to ensure the quality of software.Current approaches are based on a single-metric model, which leads to inaccurate predictions for modern architectures. This thesis presents a multi-strategies approach to extend performance prediction models to support multicore architectures.We implemented the strategies into Palladio and significantly increased the performance prediction power

    Doctor of Philosophy

    Get PDF
    dissertationIn recent years, a number of trends have started to emerge, both in microprocessor and application characteristics. As per Moore's law, the number of cores on chip will keep doubling every 18-24 months. International Technology Roadmap for Semiconductors (ITRS) reports that wires will continue to scale poorly, exacerbating the cost of on-chip communication. Cores will have to navigate an on-chip network to access data that may be scattered across many cache banks. The number of pins on the package, and hence available off-chip bandwidth, will at best increase at sublinear rate and at worst, stagnate. A number of disruptive memory technologies, e.g., phase change memory (PCM) have begun to emerge and will be integrated into the memory hierarchy sooner than later, leading to non-uniform memory access (NUMA) hierarchies. This will make the cost of accessing main memory even higher. In previous years, most of the focus has been on deciding the memory hierarchy level where data must be placed (L1 or L2 caches, main memory, disk, etc.). However, in modern and future generations, each level is getting bigger and its design is being subjected to a number of constraints (wire delays, power budget, etc.). It is becoming very important to make an intelligent decision about where data must be placed within a level. For example, in a large non-uniform access cache (NUCA), we must figure out the optimal bank. Similarly, in a multi-dual inline memory module (DIMM) non uniform memory access (NUMA) main memory, we must figure out the DIMM that is the optimal home for every data page. Studies have indicated that heterogeneous main memory hierarchies that incorporate multiple memory technologies are on the horizon. We must develop solutions for data management that take heterogeneity into account. For these memory organizations, we must again identify the appropriate home for data. In this dissertation, we attempt to verify the following thesis statement: "Can low-complexity hardware and OS mechanisms manage data placement within each memory hierarchy level to optimize metrics such as performance and/or throughput?" In this dissertation we argue for a hardware-software codesign approach to tackle the above mentioned problems at different levels of the memory hierarchy. The proposed methods utilize techniques like page coloring and shadow addresses and are able to handle a large number of problems ranging from managing wire-delays in large, shared NUCA caches to distributing shared capacity among different cores. We then examine data-placement issues in NUMA main memory for a many-core processor with a moderate number of on-chip memory controllers. Using codesign approaches, we achieve efficient data placement by modifying the operating system's (OS) page allocation algorithm for a wide variety of main memory architectures

    Aging-Aware Request Scheduling for Non-Volatile Main Memory

    Full text link
    Modern computing systems are embracing non-volatile memory (NVM) to implement high-capacity and low-cost main memory. Elevated operating voltages of NVM accelerate the aging of CMOS transistors in the peripheral circuitry of each memory bank. Aggressive device scaling increases power density and temperature, which further accelerates aging, challenging the reliable operation of NVM-based main memory. We propose HEBE, an architectural technique to mitigate the circuit aging-related problems of NVM-based main memory. HEBE is built on three contributions. First, we propose a new analytical model that can dynamically track the aging in the peripheral circuitry of each memory bank based on the bank's utilization. Second, we develop an intelligent memory request scheduler that exploits this aging model at run time to de-stress the peripheral circuitry of a memory bank only when its aging exceeds a critical threshold. Third, we introduce an isolation transistor to decouple parts of a peripheral circuit operating at different voltages, allowing the decoupled logic blocks to undergo long-latency de-stress operations independently and off the critical path of memory read and write accesses, improving performance. We evaluate HEBE with workloads from the SPEC CPU2017 Benchmark suite. Our results show that HEBE significantly improves both performance and lifetime of NVM-based main memory.Comment: To appear in ASP-DAC 202

    ์„ฑ๋Šฅ๊ณผ ์šฉ๋Ÿ‰ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ ์ธตํ˜• ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2019. 2. ์•ˆ์ •ํ˜ธ.The advance of DRAM manufacturing technology slows down, whereas the density and performance needs of DRAM continue to increase. This desire has motivated the industry to explore emerging Non-Volatile Memory (e.g., 3D XPoint) and the high-density DRAM (e.g., Managed DRAM Solution). Since such memory technologies increase the density at the cost of longer latency, lower bandwidth, or both, it is essential to use them with fast memory (e.g., conventional DRAM) to which hot pages are transferred at runtime. Nonetheless, we observe that page transfers to fast memory often block memory channels from servicing memory requests from applications for a long period. This in turn significantly increases the high-percentile response time of latency-sensitive applications. In this thesis, we propose a high-density managed DRAM architecture, dubbed 3D-XPath for applications demanding both low latency and high capacity for memory. 3D-XPath DRAM stacks conventional DRAM dies with high-density DRAM dies explored in this thesis and connects these DRAM dies with 3D-XPath. Especially, 3D-XPath allows unused memory channels to service memory requests from applications when primary channels supposed to handle the memory requests are blocked by page transfers at given moments, considerably increasing the high-percentile response time. This can also improve the throughput of applications frequently copying memory blocks between kernel and user memory spaces. Our evaluation shows that 3D-XPath DRAM decreases high-percentile response time of latency-sensitive applications by โˆผ30% while improving the throughput of an I/O-intensive applications by โˆผ39%, compared with DRAM without 3D-XPath. Recent computer systems are evolving toward the integration of more CPU cores into a single socket, which require higher memory bandwidth and capacity. Increasing the number of channels per socket is a common solution to the bandwidth demand and to better utilize these increased channels, data bus width is reduced and burst length is increased. However, this longer burst length brings increased DRAM access latency. On the memory capacity side, process scaling has been the answer for decades, but cell capacitance now limits how small a cell could be. 3D stacked memory solves this problem by stacking dies on top of other dies. We made a key observation in real multicore machine that multiple memory controllers are always not fully utilized on SPEC CPU 2006 rate benchmark. To bring these idle channels into play, we proposed memory channel sharing architecture to boost peak bandwidth of one memory channel and reduce the burst latency on 3D stacked memory. By channel sharing, the total performance on multi-programmed workloads and multi-threaded workloads improved up to respectively 4.3% and 3.6% and the average read latency reduced up to 8.22% and 10.18%.DRAM ์ œ์กฐ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์€ ์†๋„๊ฐ€ ๋Š๋ ค์ง€๋Š” ๋ฐ˜๋ฉด DRAM์˜ ๋ฐ€๋„ ๋ฐ ์„ฑ๋Šฅ ์š”๊ตฌ๋Š” ๊ณ„์† ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์š”๊ตฌ๋กœ ์ธํ•ด ์ƒˆ๋กœ์šด ๋น„ ํœ˜๋ฐœ์„ฑ ๋ฉ”๋ชจ๋ฆฌ(์˜ˆ: 3D-XPoint) ๋ฐ ๊ณ ๋ฐ€๋„ DRAM(์˜ˆ: Managed asymmetric latency DRAM Solution)์ด ๋“ฑ์žฅํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ๋ฐ€๋„ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ์ˆ ์€ ๊ธด ๋ ˆ์ดํ„ด์‹œ, ๋‚ฎ์€ ๋Œ€์—ญํญ ๋˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ชจ๋‘๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฐ€๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์•„, ํ•ซ ํŽ˜์ด์ง€๋ฅผ ๊ณ ์† ๋ฉ”๋ชจ๋ฆฌ(์˜ˆ: ์ผ๋ฐ˜ DRAM)๋กœ ์Šค์™‘๋˜๋Š” ์ €์šฉ๋Ÿ‰์˜ ๊ณ ์† ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋™์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์Šค์™‘ ๊ณผ์ •์—์„œ ๋น ๋ฅธ ๋ฉ”๋ชจ๋ฆฌ๋กœ์˜ ํŽ˜์ด์ง€ ์ „์†ก์ด ์ผ๋ฐ˜์ ์ธ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์˜ ๋ฉ”๋ชจ๋ฆฌ ์š”์ฒญ์„ ์˜ค๋žซ๋™์•ˆ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•˜๋„๋ก ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋Œ€๊ธฐ ์‹œ๊ฐ„์— ๋ฏผ๊ฐํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ๋ฐฑ๋ถ„์œ„ ์‘๋‹ต ์‹œ๊ฐ„์„ ํฌ๊ฒŒ ์ฆ๊ฐ€์‹œ์ผœ, ์‘๋‹ต ์‹œ๊ฐ„์˜ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ € ์ง€์—ฐ์‹œ๊ฐ„ ๋ฐ ๊ณ ์šฉ๋Ÿ‰ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”๊ตฌํ•˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•ด 3D-XPath, ์ฆ‰ ๊ณ ๋ฐ€๋„ ๊ด€๋ฆฌ DRAM ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Ÿฌํ•œ 3D-ํ†”์†Œ๋ฅผ ์ง‘์ ํ•œ DRAM์€ ์ €์†์˜ ๊ณ ๋ฐ€๋„ DRAM ๋‹ค์ด๋ฅผ ๊ธฐ์กด์˜ ์ผ๋ฐ˜์ ์ธ DRAM ๋‹ค์ด์™€ ๋™์‹œ์— ํ•œ ์นฉ์— ์ ์ธตํ•˜๊ณ , DRAM ๋‹ค์ด๋ผ๋ฆฌ๋Š” ์ œ์•ˆํ•˜๋Š” 3D-XPath ํ•˜๋“œ์›จ์–ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ฒฐ๋œ๋‹ค. ์ด๋Ÿฌํ•œ 3D-XPath๋Š” ํ•ซ ํŽ˜์ด์ง€ ์Šค์™‘์ด ์ผ์–ด๋‚˜๋Š” ๋™์•ˆ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์˜ ๋ฉ”๋ชจ๋ฆฌ ์š”์ฒญ์„ ์ฐจ๋‹จํ•˜์ง€ ์•Š๊ณ  ์‚ฌ์šฉ๋Ÿ‰์ด ์ ์€ ๋ฉ”๋ชจ๋ฆฌ ์ฑ„๋„๋กœ ํ•ซ ํŽ˜์ด์ง€ ์Šค์™‘์„ ์ฒ˜๋ฆฌ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ, ๋ฐ์ดํ„ฐ ์ง‘์ค‘ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ๋ฐฑ๋ถ„์œ„ ์‘๋‹ต ์‹œ๊ฐ„์„ ๊ฐœ์„ ์‹œํ‚จ๋‹ค. ๋˜ํ•œ ์ œ์•ˆํ•˜๋Š” ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ถ”๊ฐ€์ ์œผ๋กœ O/S ์ปค๋„๊ณผ ์œ ์ € ์ŠคํŽ˜์ด์Šค ๊ฐ„์˜ ๋ฉ”๋ชจ๋ฆฌ ๋ธ”๋ก์„ ์ž์ฃผ ๋ณต์‚ฌํ•˜๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ 3D-XPath DRAM์€ 3D-XPath๊ฐ€ ์—†๋Š” DRAM์— ๋น„ํ•ด I/O ์ง‘์•ฝ์ ์ธ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ์ตœ๋Œ€ 39 % ํ–ฅ์ƒ์‹œํ‚ค๋ฉด์„œ ๋ ˆ์ดํ„ด์‹œ์— ๋ฏผ๊ฐํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ๋†’์€ ๋ฐฑ๋ถ„์œ„ ์‘๋‹ต ์‹œ๊ฐ„์„ ์ตœ๋Œ€ 30 %๊นŒ์ง€ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ตœ๊ทผ์˜ ์ปดํ“จํ„ฐ ์‹œ์Šคํ…œ์€ ๋ณด๋‹ค ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ๊ณผ ์šฉ๋Ÿ‰์„ ํ•„์š”๋กœํ•˜๋Š” ๋” ๋งŽ์€ CPU ์ฝ”์–ด๋ฅผ ๋‹จ์ผ ์†Œ์ผ“์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์†Œ์ผ“ ๋‹น ์ฑ„๋„ ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์€ ๋Œ€์—ญํญ ์š”๊ตฌ์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ํ•ด๊ฒฐ์ฑ…์ด๋ฉฐ, ์ตœ์‹ ์˜ DRAM ์ธํ„ฐํŽ˜์ด์Šค์˜ ๋ฐœ์ „ ์–‘์ƒ์€ ์ฆ๊ฐ€ํ•œ ์ฑ„๋„์„ ๋ณด๋‹ค ์ž˜ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ๋ฒ„์Šค ํญ์ด ๊ฐ์†Œ๋˜๊ณ  ๋ฒ„์ŠคํŠธ ๊ธธ์ด๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธธ์–ด์ง„ ๋ฒ„์ŠคํŠธ ๊ธธ์ด๋Š” DRAM ์•ก์„ธ์Šค ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ์ตœ์‹ ์˜ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์€ ๋” ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์š”๊ตฌํ•˜๋ฉฐ, ๋ฏธ์„ธ ๊ณต์ •์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋ก ์€ ์ˆ˜์‹ญ ๋…„ ๋™์•ˆ ์‚ฌ์šฉ๋˜์—ˆ์ง€๋งŒ, 20 nm ์ดํ•˜์˜ ๋ฏธ์„ธ๊ณต์ •์—์„œ๋Š” ๋” ์ด์ƒ ๊ณต์ • ๋ฏธ์„ธํ™”๋ฅผ ํ†ตํ•ด ๋ฉ”๋ชจ๋ฆฌ ๋ฐ€๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ๊ฐ€ ์–ด๋ ค์šด ์ƒํ™ฉ์ด๋ฉฐ, ์ ์ธตํ˜• ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์šฉ๋Ÿ‰์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ, ์‹ค์ œ ์ตœ์‹ ์˜ ๋ฉ€ํ‹ฐ์ฝ”์–ด ๋จธ์‹ ์—์„œ SPEC CPU 2006 ์‘์šฉํ”„๋กœ๊ทธ๋žจ์„ ๋ฉ€ํ‹ฐ์ฝ”์–ด์—์„œ ์‹คํ–‰ํ•˜์˜€์„ ๋•Œ, ํ•ญ์ƒ ์‹œ์Šคํ…œ์˜ ๋ชจ๋“  ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์™„์ „ํžˆ ํ™œ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์‚ฌ์‹ค์„ ๊ด€์ฐฐํ–ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์œ ํœด ์ฑ„๋„์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํ•˜๋‚˜์˜ ๋ฉ”๋ชจ๋ฆฌ ์ฑ„๋„์˜ ํ”ผํฌ ๋Œ€์—ญํญ์„ ๋†’์ด๊ณ  3D ์Šคํƒ ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ฒ„์ŠคํŠธ ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ฑ„๋„ ๊ณต์œ  ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•˜์˜€์œผ๋ฉฐ, ํ•˜๋“œ์›จ์–ด ๋ธ”๋ก์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์ฑ„๋„ ๊ณต์œ ๋ฅผ ํ†ตํ•ด ๋ฉ€ํ‹ฐ ํ”„๋กœ๊ทธ๋žจ ๋œ ์‘์šฉํ”„๋กœ๊ทธ๋žจ ๋ฐ ๋‹ค์ค‘ ์Šค๋ ˆ๋“œ ์‘์šฉํ”„๋กœ๊ทธ๋žจ ์„ฑ๋Šฅ์ด ๊ฐ๊ฐ 4.3 % ๋ฐ 3.6 %๋กœ ํ–ฅ์ƒ๋˜์—ˆ์œผ๋ฉฐ ํ‰๊ท  ์ฝ๊ธฐ ๋Œ€๊ธฐ ์‹œ๊ฐ„์€ 8.22 % ๋ฐ 10.18 %๋กœ ๊ฐ์†Œํ•˜์˜€๋‹ค.Contents Abstract i Contents iv List of Figures vi List of Tables viii Introduction 1 1.1 3D-XPath: High-Density Managed DRAM Architecture with Cost-effective Alternative Paths for Memory Transactions 5 1.2 Boosting Bandwidth โ€“ Dynamic Channel Sharing on 3D Stacked Memory 9 1.3 Research contribution 13 1.4 Outline 14 3D-stacked Heterogeneous Memory Architecture with Cost-effective Extra Block Transfer Paths 17 2.1 Background 17 2.1.1 Heterogeneous Main Memory Systems 17 2.1.2 Specialized DRAM 19 2.1.3 3D-stacked Memory 22 2.2 HIGH-DENSITY DRAM ARCHITECTURE 27 2.2.1 Key Design Challenges 29 2.2.2 Plausible High-density DRAM Designs 33 2.3 3D-STACKED DRAM WITH ALTERNATIVE PATHS FOR MEMORY TRANSACTIONS 37 2.3.1 3D-XPath Architecture 41 2.3.2 3D-XPath Management 46 2.4 EXPERIMENTAL METHODOLOGY 52 2.5 EVALUATION 56 2.5.1 OLDI Workloads 56 2.5.2 Non-OLDI Workloads 61 2.5.3 Sensitivity Analysis 66 2.6 RELATED WORK 70 Boosting bandwidth โ€“Dynamic Channel Sharing on 3D Stacked Memory 72 3.1 Background: Memory Operations 72 3.1.1. Memory Controller 72 3.1.2 DRAM column access sequence 73 3.2 Related Work 74 3.3. CHANNEL SHARING ENABLED MEMORY SYSTEM 76 3.3.1 Hardware Requirements 78 3.3.2 Operation Sequence 81 3.4 Analysis 87 3.4.1 Experiment Environment 87 3.4.2 Performance 88 3.4.3 Overhead 90 CONCLUSION 92 REFERENCES 94 ๊ตญ๋ฌธ์ดˆ๋ก 107Docto
    • โ€ฆ
    corecore