29 research outputs found

    Characterization and optimization of network traffic in cortical simulation

    Get PDF
    Considering the great variety of obstacles the Exascale systems have to face in the next future, a deeper attention will be given in this thesis to the interconnect and the power consumption. The data movement challenge involves the whole hierarchical organization of components in HPC systems โ€” i.e. registers, cache, memory, disks. Running scientific applications needs to provide the most effective methods of data transport among the levels of hierarchy. On current petaflop systems, memory access at all the levels is the limiting factor in almost all applications. This drives the requirement for an interconnect achieving adequate rates of data transfer, or throughput, and reducing time delays, or latency, between the levels. Power consumption is identified as the largest hardware research challenge. The annual power cost to operate the system would be above 2.5 B$ per year for an Exascale system using current technology. The research for alternative power-efficient computing device is mandatory for the procurement of the future HPC systems. In this thesis, a preliminary approach will be offered to the critical process of co-design. Co-desing is defined as the simultaneos design of both hardware and software, to implement a desired function. This process both integrates all components of the Exascale initiative and illuminates the trade-offs that must be made within this complex undertaking

    ๊ทผ์‚ฌ ์ปดํ“จํŒ…์„ ์ด์šฉํ•œ ํšŒ๋กœ ๋…ธํ™” ๋ณด์ƒ๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์ ์ธ ์‹ ๊ฒฝ๋ง ๊ตฌํ˜„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ์ดํ˜์žฌ.Approximate computing reduces the cost (energy and/or latency) of computations by relaxing the correctness (i.e., precision) of computations up to the level, which is dependent on types of applications. Moreover, it can be realized in various hierarchies of computing system design from circuit level to application level. This dissertation presents the methodologies applying approximate computing across such hierarchies; compensating aging-induced delay in logic circuit by dynamic computation approximation (Chapter 1), designing energy-efficient neural network by combining low-power and low-latency approximate neuron models (Chapter 2), and co-designing in-memory gradient descent module with neural processing unit so as to address a memory bottleneck incurred by memory I/O for high-precision data (Chapter 3). The first chapter of this dissertation presents a novel design methodology to turn the timing violation caused by aging into computation approximation error without the reliability guardband or increasing the supply voltage. It can be realized by accurately monitoring the critical path delay at run-time. The proposal is evaluated at two levels: RTL component level and system level. The experimental results at the RTL component level show a significant improvement in terms of (normalized) mean squared error caused by the timing violation and, at the system level, show that the proposed approach successfully transforms the aging-induced timing violation errors into much less harmful computation approximation errors, therefore it recovers image quality up to perceptually acceptable levels. It reduces the dynamic and static power consumption by 21.45% and 10.78%, respectively, with 0.8% area overhead compared to the conventional approach. The second chapter of this dissertation presents an energy-efficient neural network consisting of alternative neuron models; Stochastic-Computing (SC) and Spiking (SP) neuron models. SC has been adopted in various fields to improve the power efficiency of systems by performing arithmetic computations stochastically, which approximates binary computation in conventional computing systems. Moreover, a recent work showed that deep neural network (DNN) can be implemented in the manner of stochastic computing and it greatly reduces power consumption. However, Stochastic DNN (SC-DNN) suffers from problem of high latency as it processes only a bit per cycle. To address such problem, it is proposed to adopt Spiking DNN (SP-DNN) as an input interface for SC-DNN since SP effectively processes more bits per cycle than SC-DNN. Moreover, this chapter resolves the encoding mismatch problem, between two different neuron models, without hardware cost by compensating the encoding mismatch with synapse weight calibration. A resultant hybrid DNN (SPSC-DNN) consists of SP-DNN as bottom layers and SC-DNN as top layers. Exploiting the reduced latency from SP-DNN and low-power consumption from SC-DNN, the proposed SPSC-DNN achieves improved energy-efficiency with lower error-rate compared to SC-DNN and SP-DNN in same network configuration. The third chapter of this dissertation proposes GradPim architecture, which accelerates the parameter updates by in-memory processing which is codesigned with 8-bit floating-point training in Neural Processing Unit (NPU) for deep neural networks. By keeping the high precision processing algorithms in memory, such as the parameter update incorporating high-precision weights in its computation, the GradPim architecture can achieve high computational efficiency using 8-bit floating point in NPU and also gain power efficiency by eliminating massive high-precision data transfers between NPU and off-chip memory. A simple extension of DDR4 SDRAM utilizing bank-group parallelism makes the operation designs in processing-in-memory (PIM) module efficient in terms of hardware cost and performance. The experimental results show that the proposed architecture can improve the performance of the parameter update phase in the training by up to 40% and greatly reduce the memory bandwidth requirement while posing only a minimal amount of overhead to the protocol and the DRAM area.๊ทผ์‚ฌ ์ปดํ“จํŒ…์€ ์—ฐ์‚ฐ์˜ ์ •ํ™•๋„์˜ ์†์‹ค์„ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ณ„ ์ ์ ˆํ•œ ์ˆ˜์ค€๊นŒ์ง€ ํ—ˆ์šฉํ•จ์œผ๋กœ์จ ์—ฐ์‚ฐ์— ํ•„์š”ํ•œ ๋น„์šฉ (์—๋„ˆ์ง€๋‚˜ ์ง€์—ฐ์‹œ๊ฐ„)์„ ์ค„์ธ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๊ทผ์‚ฌ ์ปดํ“จํŒ…์€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ ์„ค๊ณ„์˜ ํšŒ๋กœ ๊ณ„์ธต๋ถ€ํ„ฐ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ณ„์ธต๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๊ณ„์ธต์— ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทผ์‚ฌ ์ปดํ“จํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค์–‘ํ•œ ์‹œ์Šคํ…œ ์„ค๊ณ„์˜ ๊ณ„์ธต์— ์ ์šฉํ•˜์—ฌ ์ „๋ ฅ๊ณผ ์—๋„ˆ์ง€ ์ธก๋ฉด์—์„œ ์ด๋“์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š”, ์—ฐ์‚ฐ ๊ทผ์‚ฌํ™” (computation Approximation)๋ฅผ ํ†ตํ•ด ํšŒ๋กœ์˜ ๋…ธํ™”๋กœ ์ธํ•ด ์ฆ๊ฐ€๋œ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ถ”๊ฐ€์ ์ธ ์ „๋ ฅ์†Œ๋ชจ ์—†์ด ๋ณด์ƒํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ (์ฑ•ํ„ฐ 1), ๊ทผ์‚ฌ ๋‰ด๋Ÿฐ๋ชจ๋ธ (approximate neuron model)์„ ์ด์šฉํ•ด ์—๋„ˆ์ง€ ํšจ์œจ์ด ๋†’์€ ์‹ ๊ฒฝ๋ง์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ• (์ฑ•ํ„ฐ 2), ๊ทธ๋ฆฌ๊ณ  ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์œผ๋กœ ์ธํ•œ ๋ณ‘๋ชฉํ˜„์ƒ ๋ฌธ์ œ๋ฅผ ๋†’์€ ์ •ํ™•๋„ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ์—ฐ์‚ฐ์„ ๋ฉ”๋ชจ๋ฆฌ ๋‚ด์—์„œ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์™„ํ™”์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ (์ฑ•ํ„ฐ3) ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ํšŒ๋กœ์˜ ๋…ธํ™”๋กœ ์ธํ•œ ์ง€์—ฐ์‹œ๊ฐ„์œ„๋ฐ˜์„ (timing violation) ์„ค๊ณ„๋งˆ์ง„์ด๋‚˜ (reliability guardband) ๊ณต๊ธ‰์ „๋ ฅ์˜ ์ฆ๊ฐ€ ์—†์ด ์—ฐ์‚ฐ์˜ค์ฐจ (computation approximation error)๋ฅผ ํ†ตํ•ด ๋ณด์ƒํ•˜๋Š” ์„ค๊ณ„๋ฐฉ๋ฒ•๋ก  (design methodology)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ฃผ์š”๊ฒฝ๋กœ์˜ (critical path) ์ง€์—ฐ์‹œ๊ฐ„์„ ๋™์ž‘์‹œ๊ฐ„์— ์ •ํ™•ํ•˜๊ฒŒ ์ธก์ •ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์€ RTL component์™€ system ๋‹จ๊ณ„์—์„œ ํ‰๊ฐ€๋˜์—ˆ๋‹ค. RTL component ๋‹จ๊ณ„์˜ ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ๋ฐฉ์‹์ด ํ‘œ์ค€ํ™”๋œ ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ๋ฅผ (normalized mean squared error) ์ƒ๋‹นํžˆ ์ค„์˜€์Œ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  system ๋‹จ๊ณ„์—์„œ๋Š” ์ด๋ฏธ์ง€์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์—์„œ ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์ด ์ธ์ง€์ ์œผ๋กœ ์ถฉ๋ถ„ํžˆ ํšŒ๋ณต๋˜๋Š” ๊ฒƒ์„ ๋ณด์ž„์œผ๋กœ์จ ํšŒ๋กœ๋…ธํ™”๋กœ ์ธํ•ด ๋ฐœ์ƒํ•œ ์ง€์—ฐ์‹œ๊ฐ„์œ„๋ฐ˜ ์˜ค์ฐจ๊ฐ€ ์—๋Ÿฌ์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์€ ์—ฐ์‚ฐ์˜ค์ฐจ๋กœ ๋ณ€๊ฒฝ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ฒฐ๋ก ์ ์œผ๋กœ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ๋”ฐ๋ž์„ ๋•Œ 0.8%์˜ ๊ณต๊ฐ„์„ (area) ๋” ์‚ฌ์šฉํ•˜๋Š” ๋น„์šฉ์„ ์ง€๋ถˆํ•˜๊ณ  21.45%์˜ ๋™์ ์ „๋ ฅ์†Œ๋ชจ์™€ (dynamic power consumption) 10.78%์˜ ์ •์ ์ „๋ ฅ์†Œ๋ชจ์˜ (static power consumption) ๊ฐ์†Œ๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ๊ทผ์‚ฌ ๋‰ด๋Ÿฐ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜๋Š” ๊ณ -์—๋„ˆ์ง€ํšจ์œจ์˜ ์‹ ๊ฒฝ๋ง์„ (neural network) ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ ๋‘ ๊ฐ€์ง€์˜ ๊ทผ์‚ฌ ๋‰ด๋Ÿฐ๋ชจ๋ธ์€ ํ™•๋ฅ ์ปดํ“จํŒ…๊ณผ (stochastic computing) ์ŠคํŒŒ์ดํ‚น๋‰ด๋Ÿฐ (spiking neuron) ์ด๋ก ๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ๋ง๋˜์—ˆ๋‹ค. ํ™•๋ฅ ์ปดํ“จํŒ…์€ ์‚ฐ์ˆ ์—ฐ์‚ฐ๋“ค์„ ํ™•๋ฅ ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์ด์ง„์—ฐ์‚ฐ์„ ๋‚ฎ์€ ์ „๋ ฅ์†Œ๋ชจ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ตœ๊ทผ์— ํ™•๋ฅ ์ปดํ“จํŒ… ๋‰ด๋Ÿฐ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง (deep neural network)๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ํ™•๋ฅ ์ปดํ“จํŒ…์„ ๋‰ด๋Ÿฐ๋ชจ๋ธ๋ง์— ํ™œ์šฉํ•  ๊ฒฝ์šฐ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์ด ๋งค ํด๋ฝ์‚ฌ์ดํด๋งˆ๋‹ค (clock cycle) ํ•˜๋‚˜์˜ ๋น„ํŠธ๋งŒ์„ (bit) ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ, ์ง€์—ฐ์‹œ๊ฐ„ ์ธก๋ฉด์—์„œ ๋งค์šฐ ๋‚˜์  ์ˆ˜ ๋ฐ–์— ์—†๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿฐ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋œ ์ŠคํŒŒ์ดํ‚น ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ํ™•๋ฅ ์ปดํ“จํŒ…์„ ํ™œ์šฉํ•œ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์™€ ๊ฒฐํ•ฉํ•˜์˜€๋‹ค. ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿฐ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋งค ํด๋ฝ์‚ฌ์ดํด๋งˆ๋‹ค ์—ฌ๋Ÿฌ ๋น„ํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์˜ ์ž…๋ ฅ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์‚ฌ์šฉ๋  ๊ฒฝ์šฐ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, ํ™•๋ฅ ์ปดํ“จํŒ… ๋‰ด๋Ÿฐ๋ชจ๋ธ๊ณผ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿฐ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ถ€ํ˜ธํ™” (encoding) ๋ฐฉ์‹์ด ๋‹ค๋ฅธ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•ด๋‹น ๋ถ€ํ˜ธํ™” ๋ถˆ์ผ์น˜ ๋ฌธ์ œ๋ฅผ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šตํ•  ๋•Œ ๊ณ ๋ คํ•จ์œผ๋กœ์จ, ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์˜ ๊ฐ’์ด ๋ถ€ํ˜ธํ™” ๋ถˆ์ผ์น˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์กฐ์ ˆ (calibration) ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ„์„์˜ ๊ฒฐ๊ณผ๋กœ, ์•ž ์ชฝ์—๋Š” ์ŠคํŒŒ์ดํ‚น ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ๋ฐฐ์น˜ํ•˜๊ณ  ๋’ท ์ชฝ์• ๋Š” ํ™•๋ฅ ์ปดํ“จํŒ… ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ๋ฐฐ์น˜ํ•˜๋Š” ํ˜ผ์„ฑ์‹ ๊ฒฝ๋ง์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ํ˜ผ์„ฑ์‹ ๊ฒฝ๋ง์€ ์ŠคํŒŒ์ดํ‚น ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ํ†ตํ•ด ๋งค ํด๋ฝ์‚ฌ์ดํด๋งˆ๋‹ค ์ฒ˜๋ฆฌ๋˜๋Š” ๋น„ํŠธ ์–‘์˜ ์ฆ๊ฐ€๋กœ ์ธํ•œ ์ง€์—ฐ์‹œ๊ฐ„ ๊ฐ์†Œ ํšจ๊ณผ์™€ ํ™•๋ฅ ์ปดํ“จํŒ… ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์˜ ์ €์ „๋ ฅ ์†Œ๋ชจ ํŠน์„ฑ์„ ๋ชจ๋‘ ํ™œ์šฉํ•จ์œผ๋กœ์จ ๊ฐ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ๋”ฐ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๋น„์Šทํ•˜๊ฑฐ๋‚˜ ๋” ๋‚˜์€ ์ •ํ™•๋„ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ฉด์„œ ๋‹ฌ์„ฑํ•œ๋‹ค. ์„ธ ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ 8๋น„ํŠธ ๋ถ€๋™์†Œ์ˆซ์  ์—ฐ์‚ฐ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ์‹ ๊ฒฝ๋ง์ฒ˜๋ฆฌ์œ ๋‹›์˜ (neural processing unit) ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐฑ์‹ ์„ (parameter update) ๋ฉ”๋ชจ๋ฆฌ-๋‚ด-์—ฐ์‚ฐ์œผ๋กœ (in-memory processing) ๊ฐ€์†ํ•˜๋Š” GradPIM ์•„ํ‚คํ…์ณ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. GradPIM์€ 8๋น„ํŠธ์˜ ๋‚ฎ์€ ์ •ํ™•๋„ ์—ฐ์‚ฐ์€ ์‹ ๊ฒฝ๋ง์ฒ˜๋ฆฌ์œ ๋‹›์— ๋‚จ๊ธฐ๊ณ , ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์—ฐ์‚ฐ์€ (ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐฑ์‹ ) ๋ฉ”๋ชจ๋ฆฌ ๋‚ด๋ถ€์— ๋‘ ์œผ๋กœ์จ ์‹ ๊ฒฝ๋ง์ฒ˜๋ฆฌ์œ ๋‹›๊ณผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ„์˜ ๋ฐ์ดํ„ฐํ†ต์‹ ์˜ ์–‘์„ ์ค„์—ฌ, ๋†’์€ ์—ฐ์‚ฐํšจ์œจ๊ณผ ์ „๋ ฅํšจ์œจ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, GradPIM์€ bank-group ์ˆ˜์ค€์˜ ๋ณ‘๋ ฌํ™”๋ฅผ ์ด๋ฃจ์–ด ๋‚ด ๋†’์€ ๋‚ด๋ถ€ ๋Œ€์—ญํญ์„ ํ™œ์šฉํ•จ์œผ๋กœ์จ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์„ ํฌ๊ฒŒ ํ™•์žฅ์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ๋˜ํ•œ ์ด๋Ÿฌํ•œ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ์˜ ๋ณ€๊ฒฝ์ด ์ตœ์†Œํ™”๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์ธ ํ•˜๋“œ์›จ์–ด ๋น„์šฉ๋„ ์ตœ์†Œํ™”๋˜์—ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด GradPIM์ด ์ตœ์†Œํ•œ์˜ DRAM ํ”„๋กœํ† ์ฝœ ๋ณ€ํ™”์™€ DRAM์นฉ ๋‚ด์˜ ๊ณต๊ฐ„์‚ฌ์šฉ์„ ํ†ตํ•ด ์‹ฌ์ธต์‹ ๊ฒฝ๋ง ํ•™์Šต๊ณผ์ • ์ค‘ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐฑ์‹ ์— ํ•„์š”ํ•œ ์‹œ๊ฐ„์„ 40%๋งŒํผ ํ–ฅ์ƒ์‹œ์ผฐ์Œ์„ ๋ณด์˜€๋‹ค.Chapter I: Dynamic Computation Approximation for Aging Compensation 1 1.1 Introduction 1 1.1.1 Chip Reliability 1 1.1.2 Reliability Guardband 2 1.1.3 Approximate Computing in Logic Circuits 2 1.1.4 Computation approximation for Aging Compensation 3 1.1.5 Motivational Case Study 4 1.2 Previous Work 5 1.2.1 Aging-induced Delay 5 1.2.2 Delay-Configurable Circuits 6 1.3 Proposed System 8 1.3.1 Overview of the Proposed System 8 1.3.2 Proposed Adder 9 1.3.3 Proposed Multiplier 11 1.3.4 Proposed Monitoring Circuit 16 1.3.5 Aging Compensation Scheme 19 1.4 Design Methodology 20 1.5 Evaluation 24 1.5.1 Experimental setup 24 1.5.2 RTL component level Adder/Multiplier 27 1.5.3 RTL component level Monitoring circuit 30 1.5.4 System level 31 1.6 Summary 38 Chapter II: Energy-Efficient Neural Network by Combining Approximate Neuron Models 40 2.1 Introduction 40 2.1.1 Deep Neural Network (DNN) 40 2.1.2 Low-power designs for DNN 41 2.1.3 Stochastic-Computing Deep Neural Network 41 2.1.4 Spiking Deep Neural Network 43 2.2 Hybrid of Stochastic and Spiking DNNs 44 2.2.1 Stochastic-Computing vs Spiking Deep Neural Network 44 2.2.2 Combining Spiking Layers and Stochastic Layers 46 2.2.3 Encoding Mismatch 47 2.3 Evaluation 49 2.3.1 Latency and Test Error 49 2.3.2 Energy Efficiency 51 2.4 Summary 54 Chapter III: GradPIM: In-memory Gradient Descent in Mixed-Precision DNN Training 55 3.1 Introduction 55 3.1.1 Neural Processing Unit 55 3.1.2 Mixed-precision Training 56 3.1.3 Mixed-precision Training with In-memory Gradient Descent 57 3.1.4 DNN Parameter Update Algorithms 59 3.1.5 Modern DRAM Architecture 61 3.1.6 Motivation 63 3.2 Previous Work 65 3.2.1 Processing-In-Memory 65 3.2.2 Co-design Neural Processing Unit and Processing-In-Memory 66 3.2.3 Low-precision Computation in NPU 67 3.3 GradPIM 68 3.3.1 GradPIM Architecture 68 3.3.2 GradPIM Operations 69 3.3.3 Timing Considerations 70 3.3.4 Update Phase Procedure 73 3.3.5 Commanding GradPIM 75 3.4 NPU Co-design with GradPIM 76 3.4.1 NPU Architecture 76 3.4.2 Data Placement 79 3.5 Evaluation 82 3.5.1 Evaluation Methodology 82 3.5.2 Experimental Results 83 3.5.3 Sensitivity Analysis 88 3.5.4 Layer Characterizations 90 3.5.5 Distributed Data Parallelism 90 3.6 Summary 92 3.6.1 Discussion 92 Bibliography 113 ์š”์•ฝ 114Docto

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

    Special Topics in Information Technology

    Get PDF
    This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the thirteen best theses defended in 2019-20 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    A survey of near-data processing architectures for neural networks

    Get PDF
    Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as Near-Data Processing (NDP), machine learning, and especially neural network (NN)-based accelerators has grown significantly. Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both high-density/low-energy storage and in/near-memory computation/search engine. In this paper, we present a survey of techniques for designing NDP architectures for NN. By classifying the techniques based on the memory technology employed, we underscore their similarities and differences. Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. This paper will be valuable for computer architects, chip designers, and researchers in the area of machine learning.This work has been supported by the CoCoUnit ERC Advanced Grant of the EUโ€™s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program.Peer ReviewedPostprint (published version

    Modern computing: Vision and challenges

    Get PDF
    Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress

    Modern computing: vision and challenges

    Get PDF
    Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
    corecore