218 research outputs found

    MemZip: exploring unconventional benefits from memory compression

    Get PDF
    pre-printMemory compression has been proposed and deployed in the past to grow the capacity of a memory system and reduce page fault rates. Compression also has secondary benefits: it can reduce energy and bandwidth demands. However, most prior mechanisms have been designed to focus on the capacity metric and few prior works have attempted to explicitly reduce energy or bandwidth. Further, mechanisms that focus on the capacity metric also require complex logic to locate the requested data in memory. In this paper, we design a highly simple compressed memory architecture that does not target the capacity metric. Instead, it focuses on complexity, energy, bandwidth, and reliability. It relies on rank subsetting and a careful placement of compressed data and metadata to achieve these benefits. Further, the space made available via compression is used to boost other metrics - the space can be used to implement stronger error correction codes or energy-efficient data encodings. The best performing MemZip configuration yields a 45% performance improvement and 57% memory energy reduction, compared to an uncompressed non-sub-ranked baseline. Another energy-optimized configuration yields a 29.8% performance improvement and a 79% memory energy reduction, relative to the same baseline

    Architectural Techniques to Enable Reliable and Scalable Memory Systems

    Get PDF
    High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even building the next Exascale supercomputer. To ensure even a bare-minimum level of reliability, present-day solutions tend to have high performance, power and area overheads. Ideally, we would like memory systems to remain robust, scalable, and implementable while keeping the overheads to a minimum. This dissertation describes how simple cross-layer architectural techniques can provide orders of magnitude higher reliability and enable seamless scalability for memory systems while incurring negligible overheads.Comment: PhD thesis, Georgia Institute of Technology (May 2017

    Dependability Assessment of NAND Flash-memory for Mission-critical Applications

    Get PDF
    It is a matter of fact that NAND flash memory devices are well established in consumer market. However, it is not true that the same architectures adopted in the consumer market are suitable for mission critical applications like space. In fact, USB flash drives, digital cameras, MP3 players are usually adopted to store "less significant" data which are not changing frequently (e.g., MP3s, pictures, etc.). Therefore, in spite of NAND flash's drawbacks, a modest complexity is usually needed in the logic of commercial flash drives. On the other hand, mission critical applications have different reliability requirements from commercial scenarios. Moreover, they are usually playing in a hostile environment (e.g., the space) which contributes to worsen all the issues. We aim at providing practical valuable guidelines, comparisons and tradeoffs among the huge number of dimensions of fault tolerant methodologies for NAND flash applied to critical environments. We hope that such guidelines will be useful for our ongoing research and for all the interested reader

    LOT-ECC: LOcalized and tiered reliability mechanisms for commodity memory systems

    Get PDF
    pre-printMemory system reliability is a serious and growing concern in modern servers. Existing chipkill-level mem- ory protection mechanisms suffer from several draw- backs. They activate a large number of chips on ev- ery memory access - this increases energy consump- tion, and reduces performance due to the reduction in rank-level parallelism. Additionally, they increase ac- cess granularity, resulting in wasted bandwidth in the absence of sufficient access locality. They also restrict systems to use narrow-I/O x4 devices, which are known to be less energy-efficient than the wider x8 DRAM de- vices. In this paper, we present LOT-ECC, a local- ized and multi-tiered protection scheme that attempts to solve these problems. We separate error detection and error correction functionality, and employ simple checksum and parity codes effectively to provide strong fault-tolerance, while simultaneously simplifying imple- mentation. Data and codes are localized to the same DRAM row to improve access efficiency. We use sys- tem firmware to store correction codes in DRAM data memory and modify the memory controller to handle data mapping. We thus build an effective fault-tolerance mechanism that provides strong reliability guarantees, activates as few chips as possible (reducing power con- sumption by up to 44.8% and reducing latency by up to 46.9%), and reduces circuit complexity, all while work- ing with commodity DRAMs and operating systems. Fi- nally, we propose the novel concept of a heterogeneous DIMM that enables the extension of LOT-ECC to x16 and wider DRAM parts

    LOT-ECC: LOcalized and tiered reliability mechanisms for commodity memory systems

    Get PDF
    pre-printMemory system reliability is a serious and growing concern in modern servers. Existing chipkill-level mem- ory protection mechanisms suffer from several draw- backs. They activate a large number of chips on ev- ery memory access - this increases energy consump- tion, and reduces performance due to the reduction in rank-level parallelism. Additionally, they increase ac- cess granularity, resulting in wasted bandwidth in the absence of sufficient access locality. They also restrict systems to use narrow-I/O x4 devices, which are known to be less energy-efficient than the wider x8 DRAM de- vices. In this paper, we present LOT-ECC, a local- ized and multi-tiered protection scheme that attempts to solve these problems. We separate error detection and error correction functionality, and employ simple checksum and parity codes effectively to provide strong fault-tolerance, while simultaneously simplifying imple- mentation. Data and codes are localized to the same DRAM row to improve access efficiency. We use sys- tem firmware to store correction codes in DRAM data memory and modify the memory controller to handle data mapping. We thus build an effective fault-tolerance mechanism that provides strong reliability guarantees, activates as few chips as possible (reducing power con- sumption by up to 44.8% and reducing latency by up to 46.9%), and reduces circuit complexity, all while work- ing with commodity DRAMs and operating systems. Fi- nally, we propose the novel concept of a heterogeneous DIMM that enables the extension of LOT-ECC to x16 and wider DRAM parts

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Dependability Assessment of NAND Flash-memory for Mission-critical Applications

    Get PDF
    It is a matter of fact that NAND flash memory devices are well established in consumer market. However, it is not true that the same architectures adopted in the consumer market are suitable for mission critical applications like space. In fact, USB flash drives, digital cameras, MP3 players are usually adopted to store "less significant" data which are not changing frequently (e.g., MP3s, pictures, etc.). Therefore, in spite of NAND flashโ€™s drawbacks, a modest complexity is usually needed in the logic of commercial flash drives. On the other hand, mission critical applications have different reliability requirements from commercial scenarios. Moreover, they are usually playing in a hostile environment (e.g., the space) which contributes to worsen all the issues. We aim at providing practical valuable guidelines, comparisons and tradeoffs among the huge number of dimensions of fault tolerant methodologies for NAND flash applied to critical environments. We hope that such guidelines will be useful for our ongoing research and for all the interested readers

    ์„ฑ๋Šฅ๊ณผ ์šฉ๋Ÿ‰ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ ์ธตํ˜• ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2019. 2. ์•ˆ์ •ํ˜ธ.The advance of DRAM manufacturing technology slows down, whereas the density and performance needs of DRAM continue to increase. This desire has motivated the industry to explore emerging Non-Volatile Memory (e.g., 3D XPoint) and the high-density DRAM (e.g., Managed DRAM Solution). Since such memory technologies increase the density at the cost of longer latency, lower bandwidth, or both, it is essential to use them with fast memory (e.g., conventional DRAM) to which hot pages are transferred at runtime. Nonetheless, we observe that page transfers to fast memory often block memory channels from servicing memory requests from applications for a long period. This in turn significantly increases the high-percentile response time of latency-sensitive applications. In this thesis, we propose a high-density managed DRAM architecture, dubbed 3D-XPath for applications demanding both low latency and high capacity for memory. 3D-XPath DRAM stacks conventional DRAM dies with high-density DRAM dies explored in this thesis and connects these DRAM dies with 3D-XPath. Especially, 3D-XPath allows unused memory channels to service memory requests from applications when primary channels supposed to handle the memory requests are blocked by page transfers at given moments, considerably increasing the high-percentile response time. This can also improve the throughput of applications frequently copying memory blocks between kernel and user memory spaces. Our evaluation shows that 3D-XPath DRAM decreases high-percentile response time of latency-sensitive applications by โˆผ30% while improving the throughput of an I/O-intensive applications by โˆผ39%, compared with DRAM without 3D-XPath. Recent computer systems are evolving toward the integration of more CPU cores into a single socket, which require higher memory bandwidth and capacity. Increasing the number of channels per socket is a common solution to the bandwidth demand and to better utilize these increased channels, data bus width is reduced and burst length is increased. However, this longer burst length brings increased DRAM access latency. On the memory capacity side, process scaling has been the answer for decades, but cell capacitance now limits how small a cell could be. 3D stacked memory solves this problem by stacking dies on top of other dies. We made a key observation in real multicore machine that multiple memory controllers are always not fully utilized on SPEC CPU 2006 rate benchmark. To bring these idle channels into play, we proposed memory channel sharing architecture to boost peak bandwidth of one memory channel and reduce the burst latency on 3D stacked memory. By channel sharing, the total performance on multi-programmed workloads and multi-threaded workloads improved up to respectively 4.3% and 3.6% and the average read latency reduced up to 8.22% and 10.18%.DRAM ์ œ์กฐ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์€ ์†๋„๊ฐ€ ๋Š๋ ค์ง€๋Š” ๋ฐ˜๋ฉด DRAM์˜ ๋ฐ€๋„ ๋ฐ ์„ฑ๋Šฅ ์š”๊ตฌ๋Š” ๊ณ„์† ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์š”๊ตฌ๋กœ ์ธํ•ด ์ƒˆ๋กœ์šด ๋น„ ํœ˜๋ฐœ์„ฑ ๋ฉ”๋ชจ๋ฆฌ(์˜ˆ: 3D-XPoint) ๋ฐ ๊ณ ๋ฐ€๋„ DRAM(์˜ˆ: Managed asymmetric latency DRAM Solution)์ด ๋“ฑ์žฅํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ๋ฐ€๋„ ๋ฉ”๋ชจ๋ฆฌ ๊ธฐ์ˆ ์€ ๊ธด ๋ ˆ์ดํ„ด์‹œ, ๋‚ฎ์€ ๋Œ€์—ญํญ ๋˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ชจ๋‘๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฐ€๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์•„, ํ•ซ ํŽ˜์ด์ง€๋ฅผ ๊ณ ์† ๋ฉ”๋ชจ๋ฆฌ(์˜ˆ: ์ผ๋ฐ˜ DRAM)๋กœ ์Šค์™‘๋˜๋Š” ์ €์šฉ๋Ÿ‰์˜ ๊ณ ์† ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋™์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์Šค์™‘ ๊ณผ์ •์—์„œ ๋น ๋ฅธ ๋ฉ”๋ชจ๋ฆฌ๋กœ์˜ ํŽ˜์ด์ง€ ์ „์†ก์ด ์ผ๋ฐ˜์ ์ธ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์˜ ๋ฉ”๋ชจ๋ฆฌ ์š”์ฒญ์„ ์˜ค๋žซ๋™์•ˆ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•˜๋„๋ก ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋Œ€๊ธฐ ์‹œ๊ฐ„์— ๋ฏผ๊ฐํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ๋ฐฑ๋ถ„์œ„ ์‘๋‹ต ์‹œ๊ฐ„์„ ํฌ๊ฒŒ ์ฆ๊ฐ€์‹œ์ผœ, ์‘๋‹ต ์‹œ๊ฐ„์˜ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ € ์ง€์—ฐ์‹œ๊ฐ„ ๋ฐ ๊ณ ์šฉ๋Ÿ‰ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”๊ตฌํ•˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•ด 3D-XPath, ์ฆ‰ ๊ณ ๋ฐ€๋„ ๊ด€๋ฆฌ DRAM ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Ÿฌํ•œ 3D-ํ†”์†Œ๋ฅผ ์ง‘์ ํ•œ DRAM์€ ์ €์†์˜ ๊ณ ๋ฐ€๋„ DRAM ๋‹ค์ด๋ฅผ ๊ธฐ์กด์˜ ์ผ๋ฐ˜์ ์ธ DRAM ๋‹ค์ด์™€ ๋™์‹œ์— ํ•œ ์นฉ์— ์ ์ธตํ•˜๊ณ , DRAM ๋‹ค์ด๋ผ๋ฆฌ๋Š” ์ œ์•ˆํ•˜๋Š” 3D-XPath ํ•˜๋“œ์›จ์–ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ฒฐ๋œ๋‹ค. ์ด๋Ÿฌํ•œ 3D-XPath๋Š” ํ•ซ ํŽ˜์ด์ง€ ์Šค์™‘์ด ์ผ์–ด๋‚˜๋Š” ๋™์•ˆ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์˜ ๋ฉ”๋ชจ๋ฆฌ ์š”์ฒญ์„ ์ฐจ๋‹จํ•˜์ง€ ์•Š๊ณ  ์‚ฌ์šฉ๋Ÿ‰์ด ์ ์€ ๋ฉ”๋ชจ๋ฆฌ ์ฑ„๋„๋กœ ํ•ซ ํŽ˜์ด์ง€ ์Šค์™‘์„ ์ฒ˜๋ฆฌ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ, ๋ฐ์ดํ„ฐ ์ง‘์ค‘ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ๋ฐฑ๋ถ„์œ„ ์‘๋‹ต ์‹œ๊ฐ„์„ ๊ฐœ์„ ์‹œํ‚จ๋‹ค. ๋˜ํ•œ ์ œ์•ˆํ•˜๋Š” ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ถ”๊ฐ€์ ์œผ๋กœ O/S ์ปค๋„๊ณผ ์œ ์ € ์ŠคํŽ˜์ด์Šค ๊ฐ„์˜ ๋ฉ”๋ชจ๋ฆฌ ๋ธ”๋ก์„ ์ž์ฃผ ๋ณต์‚ฌํ•˜๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ 3D-XPath DRAM์€ 3D-XPath๊ฐ€ ์—†๋Š” DRAM์— ๋น„ํ•ด I/O ์ง‘์•ฝ์ ์ธ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ์ตœ๋Œ€ 39 % ํ–ฅ์ƒ์‹œํ‚ค๋ฉด์„œ ๋ ˆ์ดํ„ด์‹œ์— ๋ฏผ๊ฐํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ๋†’์€ ๋ฐฑ๋ถ„์œ„ ์‘๋‹ต ์‹œ๊ฐ„์„ ์ตœ๋Œ€ 30 %๊นŒ์ง€ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ตœ๊ทผ์˜ ์ปดํ“จํ„ฐ ์‹œ์Šคํ…œ์€ ๋ณด๋‹ค ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ๊ณผ ์šฉ๋Ÿ‰์„ ํ•„์š”๋กœํ•˜๋Š” ๋” ๋งŽ์€ CPU ์ฝ”์–ด๋ฅผ ๋‹จ์ผ ์†Œ์ผ“์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ™”ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์†Œ์ผ“ ๋‹น ์ฑ„๋„ ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์€ ๋Œ€์—ญํญ ์š”๊ตฌ์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ํ•ด๊ฒฐ์ฑ…์ด๋ฉฐ, ์ตœ์‹ ์˜ DRAM ์ธํ„ฐํŽ˜์ด์Šค์˜ ๋ฐœ์ „ ์–‘์ƒ์€ ์ฆ๊ฐ€ํ•œ ์ฑ„๋„์„ ๋ณด๋‹ค ์ž˜ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ๋ฒ„์Šค ํญ์ด ๊ฐ์†Œ๋˜๊ณ  ๋ฒ„์ŠคํŠธ ๊ธธ์ด๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธธ์–ด์ง„ ๋ฒ„์ŠคํŠธ ๊ธธ์ด๋Š” DRAM ์•ก์„ธ์Šค ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ์ตœ์‹ ์˜ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์€ ๋” ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์š”๊ตฌํ•˜๋ฉฐ, ๋ฏธ์„ธ ๊ณต์ •์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋ก ์€ ์ˆ˜์‹ญ ๋…„ ๋™์•ˆ ์‚ฌ์šฉ๋˜์—ˆ์ง€๋งŒ, 20 nm ์ดํ•˜์˜ ๋ฏธ์„ธ๊ณต์ •์—์„œ๋Š” ๋” ์ด์ƒ ๊ณต์ • ๋ฏธ์„ธํ™”๋ฅผ ํ†ตํ•ด ๋ฉ”๋ชจ๋ฆฌ ๋ฐ€๋„๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ๊ฐ€ ์–ด๋ ค์šด ์ƒํ™ฉ์ด๋ฉฐ, ์ ์ธตํ˜• ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์šฉ๋Ÿ‰์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ, ์‹ค์ œ ์ตœ์‹ ์˜ ๋ฉ€ํ‹ฐ์ฝ”์–ด ๋จธ์‹ ์—์„œ SPEC CPU 2006 ์‘์šฉํ”„๋กœ๊ทธ๋žจ์„ ๋ฉ€ํ‹ฐ์ฝ”์–ด์—์„œ ์‹คํ–‰ํ•˜์˜€์„ ๋•Œ, ํ•ญ์ƒ ์‹œ์Šคํ…œ์˜ ๋ชจ๋“  ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์™„์ „ํžˆ ํ™œ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์‚ฌ์‹ค์„ ๊ด€์ฐฐํ–ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์œ ํœด ์ฑ„๋„์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํ•˜๋‚˜์˜ ๋ฉ”๋ชจ๋ฆฌ ์ฑ„๋„์˜ ํ”ผํฌ ๋Œ€์—ญํญ์„ ๋†’์ด๊ณ  3D ์Šคํƒ ๋ฉ”๋ชจ๋ฆฌ์˜ ๋ฒ„์ŠคํŠธ ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ฑ„๋„ ๊ณต์œ  ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•˜์˜€์œผ๋ฉฐ, ํ•˜๋“œ์›จ์–ด ๋ธ”๋ก์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์ฑ„๋„ ๊ณต์œ ๋ฅผ ํ†ตํ•ด ๋ฉ€ํ‹ฐ ํ”„๋กœ๊ทธ๋žจ ๋œ ์‘์šฉํ”„๋กœ๊ทธ๋žจ ๋ฐ ๋‹ค์ค‘ ์Šค๋ ˆ๋“œ ์‘์šฉํ”„๋กœ๊ทธ๋žจ ์„ฑ๋Šฅ์ด ๊ฐ๊ฐ 4.3 % ๋ฐ 3.6 %๋กœ ํ–ฅ์ƒ๋˜์—ˆ์œผ๋ฉฐ ํ‰๊ท  ์ฝ๊ธฐ ๋Œ€๊ธฐ ์‹œ๊ฐ„์€ 8.22 % ๋ฐ 10.18 %๋กœ ๊ฐ์†Œํ•˜์˜€๋‹ค.Contents Abstract i Contents iv List of Figures vi List of Tables viii Introduction 1 1.1 3D-XPath: High-Density Managed DRAM Architecture with Cost-effective Alternative Paths for Memory Transactions 5 1.2 Boosting Bandwidth โ€“ Dynamic Channel Sharing on 3D Stacked Memory 9 1.3 Research contribution 13 1.4 Outline 14 3D-stacked Heterogeneous Memory Architecture with Cost-effective Extra Block Transfer Paths 17 2.1 Background 17 2.1.1 Heterogeneous Main Memory Systems 17 2.1.2 Specialized DRAM 19 2.1.3 3D-stacked Memory 22 2.2 HIGH-DENSITY DRAM ARCHITECTURE 27 2.2.1 Key Design Challenges 29 2.2.2 Plausible High-density DRAM Designs 33 2.3 3D-STACKED DRAM WITH ALTERNATIVE PATHS FOR MEMORY TRANSACTIONS 37 2.3.1 3D-XPath Architecture 41 2.3.2 3D-XPath Management 46 2.4 EXPERIMENTAL METHODOLOGY 52 2.5 EVALUATION 56 2.5.1 OLDI Workloads 56 2.5.2 Non-OLDI Workloads 61 2.5.3 Sensitivity Analysis 66 2.6 RELATED WORK 70 Boosting bandwidth โ€“Dynamic Channel Sharing on 3D Stacked Memory 72 3.1 Background: Memory Operations 72 3.1.1. Memory Controller 72 3.1.2 DRAM column access sequence 73 3.2 Related Work 74 3.3. CHANNEL SHARING ENABLED MEMORY SYSTEM 76 3.3.1 Hardware Requirements 78 3.3.2 Operation Sequence 81 3.4 Analysis 87 3.4.1 Experiment Environment 87 3.4.2 Performance 88 3.4.3 Overhead 90 CONCLUSION 92 REFERENCES 94 ๊ตญ๋ฌธ์ดˆ๋ก 107Docto

    Techniques for Improving Security and Trustworthiness of Integrated Circuits

    Get PDF
    The integrated circuit (IC) development process is becoming increasingly vulnerable to malicious activities because untrusted parties could be involved in this IC development flow. There are four typical problems that impact the security and trustworthiness of ICs used in military, financial, transportation, or other critical systems: (i) Malicious inclusions and alterations, known as hardware Trojans, can be inserted into a design by modifying the design during GDSII development and fabrication. Hardware Trojans in ICs may cause malfunctions, lower the reliability of ICs, leak confidential information to adversaries or even destroy the system under specifically designed conditions. (ii) The number of circuit-related counterfeiting incidents reported by component manufacturers has increased significantly over the past few years with recycled ICs contributing the largest percentage of the total reported counterfeiting incidents. Since these recycled ICs have been used in the field before, the performance and reliability of such ICs has been degraded by aging effects and harsh recycling process. (iii) Reverse engineering (RE) is process of extracting a circuitโ€™s gate-level netlist, and/or inferring its functionality. The RE causes threats to the design because attackers can steal and pirate a design (IP piracy), identify the device technology, or facilitate other hardware attacks. (iv) Traditional tools for uniquely identifying devices are vulnerable to non-invasive or invasive physical attacks. Securing the ID/key is of utmost importance since leakage of even a single device ID/key could be exploited by an adversary to hack other devices or produce pirated devices. In this work, we have developed a series of design and test methodologies to deal with these four challenging issues and thus enhance the security, trustworthiness and reliability of ICs. The techniques proposed in this thesis include: a path delay fingerprinting technique for detection of hardware Trojans, recycled ICs, and other types counterfeit ICs including remarked, overproduced, and cloned ICs with their unique identifiers; a Built-In Self-Authentication (BISA) technique to prevent hardware Trojan insertions by untrusted fabrication facilities; an efficient and secure split manufacturing via Obfuscated Built-In Self-Authentication (OBISA) technique to prevent reverse engineering by untrusted fabrication facilities; and a novel bit selection approach for obtaining the most reliable bits for SRAM-based physical unclonable function (PUF) across environmental conditions and silicon aging effects
    • โ€ฆ
    corecore