8 research outputs found

    Novel hybrid framework for image compression for supportive hardware design of boosting compression

    Get PDF
    Performing the image compression over the resource constrained hardware is quite a challenging task. Although, there has been various approaches being carried out towards image compression considering the hardware aspect of it, but still there are problems associated with the memory acceleration associated with the entire operation that downgrade the performance of the hardware device. Therefore, the proposed approach presents a cost effective image compression mechanism which offers lossless compression using a unique combination of the non-linear filtering, segmentation, contour detection, followed by the optimization. The compression mechanism adapts analytical approach for significant image compression. The execution of the compression mechanism yields faster response time, reduced mean square error, improved signal quality and significant compression ratio performance

    ๊ทผ์‚ฌ ์ปดํ“จํŒ…์„ ์ด์šฉํ•œ ํšŒ๋กœ ๋…ธํ™” ๋ณด์ƒ๊ณผ ์—๋„ˆ์ง€ ํšจ์œจ์ ์ธ ์‹ ๊ฒฝ๋ง ๊ตฌํ˜„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ์ดํ˜์žฌ.Approximate computing reduces the cost (energy and/or latency) of computations by relaxing the correctness (i.e., precision) of computations up to the level, which is dependent on types of applications. Moreover, it can be realized in various hierarchies of computing system design from circuit level to application level. This dissertation presents the methodologies applying approximate computing across such hierarchies; compensating aging-induced delay in logic circuit by dynamic computation approximation (Chapter 1), designing energy-efficient neural network by combining low-power and low-latency approximate neuron models (Chapter 2), and co-designing in-memory gradient descent module with neural processing unit so as to address a memory bottleneck incurred by memory I/O for high-precision data (Chapter 3). The first chapter of this dissertation presents a novel design methodology to turn the timing violation caused by aging into computation approximation error without the reliability guardband or increasing the supply voltage. It can be realized by accurately monitoring the critical path delay at run-time. The proposal is evaluated at two levels: RTL component level and system level. The experimental results at the RTL component level show a significant improvement in terms of (normalized) mean squared error caused by the timing violation and, at the system level, show that the proposed approach successfully transforms the aging-induced timing violation errors into much less harmful computation approximation errors, therefore it recovers image quality up to perceptually acceptable levels. It reduces the dynamic and static power consumption by 21.45% and 10.78%, respectively, with 0.8% area overhead compared to the conventional approach. The second chapter of this dissertation presents an energy-efficient neural network consisting of alternative neuron models; Stochastic-Computing (SC) and Spiking (SP) neuron models. SC has been adopted in various fields to improve the power efficiency of systems by performing arithmetic computations stochastically, which approximates binary computation in conventional computing systems. Moreover, a recent work showed that deep neural network (DNN) can be implemented in the manner of stochastic computing and it greatly reduces power consumption. However, Stochastic DNN (SC-DNN) suffers from problem of high latency as it processes only a bit per cycle. To address such problem, it is proposed to adopt Spiking DNN (SP-DNN) as an input interface for SC-DNN since SP effectively processes more bits per cycle than SC-DNN. Moreover, this chapter resolves the encoding mismatch problem, between two different neuron models, without hardware cost by compensating the encoding mismatch with synapse weight calibration. A resultant hybrid DNN (SPSC-DNN) consists of SP-DNN as bottom layers and SC-DNN as top layers. Exploiting the reduced latency from SP-DNN and low-power consumption from SC-DNN, the proposed SPSC-DNN achieves improved energy-efficiency with lower error-rate compared to SC-DNN and SP-DNN in same network configuration. The third chapter of this dissertation proposes GradPim architecture, which accelerates the parameter updates by in-memory processing which is codesigned with 8-bit floating-point training in Neural Processing Unit (NPU) for deep neural networks. By keeping the high precision processing algorithms in memory, such as the parameter update incorporating high-precision weights in its computation, the GradPim architecture can achieve high computational efficiency using 8-bit floating point in NPU and also gain power efficiency by eliminating massive high-precision data transfers between NPU and off-chip memory. A simple extension of DDR4 SDRAM utilizing bank-group parallelism makes the operation designs in processing-in-memory (PIM) module efficient in terms of hardware cost and performance. The experimental results show that the proposed architecture can improve the performance of the parameter update phase in the training by up to 40% and greatly reduce the memory bandwidth requirement while posing only a minimal amount of overhead to the protocol and the DRAM area.๊ทผ์‚ฌ ์ปดํ“จํŒ…์€ ์—ฐ์‚ฐ์˜ ์ •ํ™•๋„์˜ ์†์‹ค์„ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ณ„ ์ ์ ˆํ•œ ์ˆ˜์ค€๊นŒ์ง€ ํ—ˆ์šฉํ•จ์œผ๋กœ์จ ์—ฐ์‚ฐ์— ํ•„์š”ํ•œ ๋น„์šฉ (์—๋„ˆ์ง€๋‚˜ ์ง€์—ฐ์‹œ๊ฐ„)์„ ์ค„์ธ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๊ทผ์‚ฌ ์ปดํ“จํŒ…์€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ ์„ค๊ณ„์˜ ํšŒ๋กœ ๊ณ„์ธต๋ถ€ํ„ฐ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ณ„์ธต๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๊ณ„์ธต์— ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ทผ์‚ฌ ์ปดํ“จํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค์–‘ํ•œ ์‹œ์Šคํ…œ ์„ค๊ณ„์˜ ๊ณ„์ธต์— ์ ์šฉํ•˜์—ฌ ์ „๋ ฅ๊ณผ ์—๋„ˆ์ง€ ์ธก๋ฉด์—์„œ ์ด๋“์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š”, ์—ฐ์‚ฐ ๊ทผ์‚ฌํ™” (computation Approximation)๋ฅผ ํ†ตํ•ด ํšŒ๋กœ์˜ ๋…ธํ™”๋กœ ์ธํ•ด ์ฆ๊ฐ€๋œ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ถ”๊ฐ€์ ์ธ ์ „๋ ฅ์†Œ๋ชจ ์—†์ด ๋ณด์ƒํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ (์ฑ•ํ„ฐ 1), ๊ทผ์‚ฌ ๋‰ด๋Ÿฐ๋ชจ๋ธ (approximate neuron model)์„ ์ด์šฉํ•ด ์—๋„ˆ์ง€ ํšจ์œจ์ด ๋†’์€ ์‹ ๊ฒฝ๋ง์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ• (์ฑ•ํ„ฐ 2), ๊ทธ๋ฆฌ๊ณ  ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์œผ๋กœ ์ธํ•œ ๋ณ‘๋ชฉํ˜„์ƒ ๋ฌธ์ œ๋ฅผ ๋†’์€ ์ •ํ™•๋„ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ์—ฐ์‚ฐ์„ ๋ฉ”๋ชจ๋ฆฌ ๋‚ด์—์„œ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์™„ํ™”์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ (์ฑ•ํ„ฐ3) ์ œ์•ˆํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ํšŒ๋กœ์˜ ๋…ธํ™”๋กœ ์ธํ•œ ์ง€์—ฐ์‹œ๊ฐ„์œ„๋ฐ˜์„ (timing violation) ์„ค๊ณ„๋งˆ์ง„์ด๋‚˜ (reliability guardband) ๊ณต๊ธ‰์ „๋ ฅ์˜ ์ฆ๊ฐ€ ์—†์ด ์—ฐ์‚ฐ์˜ค์ฐจ (computation approximation error)๋ฅผ ํ†ตํ•ด ๋ณด์ƒํ•˜๋Š” ์„ค๊ณ„๋ฐฉ๋ฒ•๋ก  (design methodology)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ฃผ์š”๊ฒฝ๋กœ์˜ (critical path) ์ง€์—ฐ์‹œ๊ฐ„์„ ๋™์ž‘์‹œ๊ฐ„์— ์ •ํ™•ํ•˜๊ฒŒ ์ธก์ •ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์€ RTL component์™€ system ๋‹จ๊ณ„์—์„œ ํ‰๊ฐ€๋˜์—ˆ๋‹ค. RTL component ๋‹จ๊ณ„์˜ ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ๋ฐฉ์‹์ด ํ‘œ์ค€ํ™”๋œ ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ๋ฅผ (normalized mean squared error) ์ƒ๋‹นํžˆ ์ค„์˜€์Œ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  system ๋‹จ๊ณ„์—์„œ๋Š” ์ด๋ฏธ์ง€์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์—์„œ ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์ด ์ธ์ง€์ ์œผ๋กœ ์ถฉ๋ถ„ํžˆ ํšŒ๋ณต๋˜๋Š” ๊ฒƒ์„ ๋ณด์ž„์œผ๋กœ์จ ํšŒ๋กœ๋…ธํ™”๋กœ ์ธํ•ด ๋ฐœ์ƒํ•œ ์ง€์—ฐ์‹œ๊ฐ„์œ„๋ฐ˜ ์˜ค์ฐจ๊ฐ€ ์—๋Ÿฌ์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์€ ์—ฐ์‚ฐ์˜ค์ฐจ๋กœ ๋ณ€๊ฒฝ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ฒฐ๋ก ์ ์œผ๋กœ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ๋”ฐ๋ž์„ ๋•Œ 0.8%์˜ ๊ณต๊ฐ„์„ (area) ๋” ์‚ฌ์šฉํ•˜๋Š” ๋น„์šฉ์„ ์ง€๋ถˆํ•˜๊ณ  21.45%์˜ ๋™์ ์ „๋ ฅ์†Œ๋ชจ์™€ (dynamic power consumption) 10.78%์˜ ์ •์ ์ „๋ ฅ์†Œ๋ชจ์˜ (static power consumption) ๊ฐ์†Œ๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ๊ทผ์‚ฌ ๋‰ด๋Ÿฐ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜๋Š” ๊ณ -์—๋„ˆ์ง€ํšจ์œจ์˜ ์‹ ๊ฒฝ๋ง์„ (neural network) ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ ๋‘ ๊ฐ€์ง€์˜ ๊ทผ์‚ฌ ๋‰ด๋Ÿฐ๋ชจ๋ธ์€ ํ™•๋ฅ ์ปดํ“จํŒ…๊ณผ (stochastic computing) ์ŠคํŒŒ์ดํ‚น๋‰ด๋Ÿฐ (spiking neuron) ์ด๋ก ๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ๋ง๋˜์—ˆ๋‹ค. ํ™•๋ฅ ์ปดํ“จํŒ…์€ ์‚ฐ์ˆ ์—ฐ์‚ฐ๋“ค์„ ํ™•๋ฅ ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•จ์œผ๋กœ์จ ์ด์ง„์—ฐ์‚ฐ์„ ๋‚ฎ์€ ์ „๋ ฅ์†Œ๋ชจ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ตœ๊ทผ์— ํ™•๋ฅ ์ปดํ“จํŒ… ๋‰ด๋Ÿฐ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง (deep neural network)๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ํ™•๋ฅ ์ปดํ“จํŒ…์„ ๋‰ด๋Ÿฐ๋ชจ๋ธ๋ง์— ํ™œ์šฉํ•  ๊ฒฝ์šฐ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์ด ๋งค ํด๋ฝ์‚ฌ์ดํด๋งˆ๋‹ค (clock cycle) ํ•˜๋‚˜์˜ ๋น„ํŠธ๋งŒ์„ (bit) ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ, ์ง€์—ฐ์‹œ๊ฐ„ ์ธก๋ฉด์—์„œ ๋งค์šฐ ๋‚˜์  ์ˆ˜ ๋ฐ–์— ์—†๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿฐ๋ชจ๋ธ๋กœ ๊ตฌ์„ฑ๋œ ์ŠคํŒŒ์ดํ‚น ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ํ™•๋ฅ ์ปดํ“จํŒ…์„ ํ™œ์šฉํ•œ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์™€ ๊ฒฐํ•ฉํ•˜์˜€๋‹ค. ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿฐ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋งค ํด๋ฝ์‚ฌ์ดํด๋งˆ๋‹ค ์—ฌ๋Ÿฌ ๋น„ํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์˜ ์ž…๋ ฅ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์‚ฌ์šฉ๋  ๊ฒฝ์šฐ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, ํ™•๋ฅ ์ปดํ“จํŒ… ๋‰ด๋Ÿฐ๋ชจ๋ธ๊ณผ ์ŠคํŒŒ์ดํ‚น ๋‰ด๋Ÿฐ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ถ€ํ˜ธํ™” (encoding) ๋ฐฉ์‹์ด ๋‹ค๋ฅธ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•ด๋‹น ๋ถ€ํ˜ธํ™” ๋ถˆ์ผ์น˜ ๋ฌธ์ œ๋ฅผ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šตํ•  ๋•Œ ๊ณ ๋ คํ•จ์œผ๋กœ์จ, ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์˜ ๊ฐ’์ด ๋ถ€ํ˜ธํ™” ๋ถˆ์ผ์น˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์กฐ์ ˆ (calibration) ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ„์„์˜ ๊ฒฐ๊ณผ๋กœ, ์•ž ์ชฝ์—๋Š” ์ŠคํŒŒ์ดํ‚น ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ๋ฐฐ์น˜ํ•˜๊ณ  ๋’ท ์ชฝ์• ๋Š” ํ™•๋ฅ ์ปดํ“จํŒ… ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ๋ฐฐ์น˜ํ•˜๋Š” ํ˜ผ์„ฑ์‹ ๊ฒฝ๋ง์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ํ˜ผ์„ฑ์‹ ๊ฒฝ๋ง์€ ์ŠคํŒŒ์ดํ‚น ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ํ†ตํ•ด ๋งค ํด๋ฝ์‚ฌ์ดํด๋งˆ๋‹ค ์ฒ˜๋ฆฌ๋˜๋Š” ๋น„ํŠธ ์–‘์˜ ์ฆ๊ฐ€๋กœ ์ธํ•œ ์ง€์—ฐ์‹œ๊ฐ„ ๊ฐ์†Œ ํšจ๊ณผ์™€ ํ™•๋ฅ ์ปดํ“จํŒ… ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์˜ ์ €์ „๋ ฅ ์†Œ๋ชจ ํŠน์„ฑ์„ ๋ชจ๋‘ ํ™œ์šฉํ•จ์œผ๋กœ์จ ๊ฐ ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ ๋”ฐ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๋น„์Šทํ•˜๊ฑฐ๋‚˜ ๋” ๋‚˜์€ ์ •ํ™•๋„ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ฉด์„œ ๋‹ฌ์„ฑํ•œ๋‹ค. ์„ธ ๋ฒˆ์งธ ์ฑ•ํ„ฐ๋Š” ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์„ 8๋น„ํŠธ ๋ถ€๋™์†Œ์ˆซ์  ์—ฐ์‚ฐ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ์‹ ๊ฒฝ๋ง์ฒ˜๋ฆฌ์œ ๋‹›์˜ (neural processing unit) ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐฑ์‹ ์„ (parameter update) ๋ฉ”๋ชจ๋ฆฌ-๋‚ด-์—ฐ์‚ฐ์œผ๋กœ (in-memory processing) ๊ฐ€์†ํ•˜๋Š” GradPIM ์•„ํ‚คํ…์ณ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. GradPIM์€ 8๋น„ํŠธ์˜ ๋‚ฎ์€ ์ •ํ™•๋„ ์—ฐ์‚ฐ์€ ์‹ ๊ฒฝ๋ง์ฒ˜๋ฆฌ์œ ๋‹›์— ๋‚จ๊ธฐ๊ณ , ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ์—ฐ์‚ฐ์€ (ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐฑ์‹ ) ๋ฉ”๋ชจ๋ฆฌ ๋‚ด๋ถ€์— ๋‘ ์œผ๋กœ์จ ์‹ ๊ฒฝ๋ง์ฒ˜๋ฆฌ์œ ๋‹›๊ณผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ„์˜ ๋ฐ์ดํ„ฐํ†ต์‹ ์˜ ์–‘์„ ์ค„์—ฌ, ๋†’์€ ์—ฐ์‚ฐํšจ์œจ๊ณผ ์ „๋ ฅํšจ์œจ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, GradPIM์€ bank-group ์ˆ˜์ค€์˜ ๋ณ‘๋ ฌํ™”๋ฅผ ์ด๋ฃจ์–ด ๋‚ด ๋†’์€ ๋‚ด๋ถ€ ๋Œ€์—ญํญ์„ ํ™œ์šฉํ•จ์œผ๋กœ์จ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์„ ํฌ๊ฒŒ ํ™•์žฅ์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. ๋˜ํ•œ ์ด๋Ÿฌํ•œ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ์˜ ๋ณ€๊ฒฝ์ด ์ตœ์†Œํ™”๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์ธ ํ•˜๋“œ์›จ์–ด ๋น„์šฉ๋„ ์ตœ์†Œํ™”๋˜์—ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด GradPIM์ด ์ตœ์†Œํ•œ์˜ DRAM ํ”„๋กœํ† ์ฝœ ๋ณ€ํ™”์™€ DRAM์นฉ ๋‚ด์˜ ๊ณต๊ฐ„์‚ฌ์šฉ์„ ํ†ตํ•ด ์‹ฌ์ธต์‹ ๊ฒฝ๋ง ํ•™์Šต๊ณผ์ • ์ค‘ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐฑ์‹ ์— ํ•„์š”ํ•œ ์‹œ๊ฐ„์„ 40%๋งŒํผ ํ–ฅ์ƒ์‹œ์ผฐ์Œ์„ ๋ณด์˜€๋‹ค.Chapter I: Dynamic Computation Approximation for Aging Compensation 1 1.1 Introduction 1 1.1.1 Chip Reliability 1 1.1.2 Reliability Guardband 2 1.1.3 Approximate Computing in Logic Circuits 2 1.1.4 Computation approximation for Aging Compensation 3 1.1.5 Motivational Case Study 4 1.2 Previous Work 5 1.2.1 Aging-induced Delay 5 1.2.2 Delay-Configurable Circuits 6 1.3 Proposed System 8 1.3.1 Overview of the Proposed System 8 1.3.2 Proposed Adder 9 1.3.3 Proposed Multiplier 11 1.3.4 Proposed Monitoring Circuit 16 1.3.5 Aging Compensation Scheme 19 1.4 Design Methodology 20 1.5 Evaluation 24 1.5.1 Experimental setup 24 1.5.2 RTL component level Adder/Multiplier 27 1.5.3 RTL component level Monitoring circuit 30 1.5.4 System level 31 1.6 Summary 38 Chapter II: Energy-Efficient Neural Network by Combining Approximate Neuron Models 40 2.1 Introduction 40 2.1.1 Deep Neural Network (DNN) 40 2.1.2 Low-power designs for DNN 41 2.1.3 Stochastic-Computing Deep Neural Network 41 2.1.4 Spiking Deep Neural Network 43 2.2 Hybrid of Stochastic and Spiking DNNs 44 2.2.1 Stochastic-Computing vs Spiking Deep Neural Network 44 2.2.2 Combining Spiking Layers and Stochastic Layers 46 2.2.3 Encoding Mismatch 47 2.3 Evaluation 49 2.3.1 Latency and Test Error 49 2.3.2 Energy Efficiency 51 2.4 Summary 54 Chapter III: GradPIM: In-memory Gradient Descent in Mixed-Precision DNN Training 55 3.1 Introduction 55 3.1.1 Neural Processing Unit 55 3.1.2 Mixed-precision Training 56 3.1.3 Mixed-precision Training with In-memory Gradient Descent 57 3.1.4 DNN Parameter Update Algorithms 59 3.1.5 Modern DRAM Architecture 61 3.1.6 Motivation 63 3.2 Previous Work 65 3.2.1 Processing-In-Memory 65 3.2.2 Co-design Neural Processing Unit and Processing-In-Memory 66 3.2.3 Low-precision Computation in NPU 67 3.3 GradPIM 68 3.3.1 GradPIM Architecture 68 3.3.2 GradPIM Operations 69 3.3.3 Timing Considerations 70 3.3.4 Update Phase Procedure 73 3.3.5 Commanding GradPIM 75 3.4 NPU Co-design with GradPIM 76 3.4.1 NPU Architecture 76 3.4.2 Data Placement 79 3.5 Evaluation 82 3.5.1 Evaluation Methodology 82 3.5.2 Experimental Results 83 3.5.3 Sensitivity Analysis 88 3.5.4 Layer Characterizations 90 3.5.5 Distributed Data Parallelism 90 3.6 Summary 92 3.6.1 Discussion 92 Bibliography 113 ์š”์•ฝ 114Docto

    FPGA acceleration of DNA sequence alignment: design analysis and optimization

    Get PDF
    Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

    Dominant Feature Pooling for Multi Camera Object Detection and Optimization of Retinex Algorithm

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์ดํ˜์žฌ.๋ณธ ๋…ผ๋ฌธ์€ ๋ฉ€ํ‹ฐ ์นด๋ฉ”๋ผ object detection CNN์„ ์œ„ํ•œ detection ๋‹จ๊ณ„์—์„œ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด dominant feature pooling ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฉ€ํ‹ฐ ์นด๋ฉ”๋ผ ์‹œ์Šคํ…œ์€ ๋‹ค์–‘ํ•œ ๊ด€์ ์—์„œ ๋ฌผ์ฒด์˜ ์ด๋ฏธ์ง€๋ฅผ ์บก์ฒ˜ํ•˜๊ณ , ๋ฌผ์ฒด์˜ ๋” ๋งŽ์€ ์ฃผ์š” feature๋ฅผ detection์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์—ฌ๋Ÿฌ ์นด๋ฉ”๋ผ์—์„œ feature๋ฅผ poolingํ•˜๋ฉด detection ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๊ฐ์ฒด์˜ ๋‹ค์–‘ํ•œ ๋ทฐํฌ์ธํŠธ์—์„œ ์–ป์€ feature vector ์ค‘์—์„œ ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ์ฃผ์š” feature์„ ์„ ํƒํ•˜๊ณ  ์„ ํƒํ•œ feature vector๋ฅผ poolingํ•˜์—ฌ ์ƒˆ๋กœ์šด feature map์„ ๊ตฌ์„ฑํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋‹จ์ผ ์นด๋ฉ”๋ผ์— ๋Œ€ํ•œ YOLOv3 ๋„คํŠธ์›Œํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ๋ฉ€ํ‹ฐ ์นด๋ฉ”๋ผ ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ํ•™์Šต ๊ณผ์ •์ด ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค. Dominant feature pooling์˜ ํšจ๊ณผ๋ฅผ ์ฃผ์žฅํ•˜๊ธฐ ์œ„ํ•ด, ์ด ์—ฐ๊ตฌ์—์„œ๋Š” feature vector๋ฅผ ์‹œ๊ฐํ™”ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋„ ์ œ์•ˆ๋œ๋‹ค. ๋˜ํ•œ object detection CNN์€ ์ €์กฐ๋„ ํ™˜๊ฒฝ์— ๋Œ€์‘์ด ์ทจ์•ฝํ•˜๋ฏ€๋กœ ์ด๋ฅผ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” Retinex ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ™œ์šฉ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ €์กฐ๋„ ์˜์ƒ์„ ๊ทธ๋Œ€๋กœ ํ•™์Šตํ•˜์—ฌ ๊ฐœ์„ ์„ ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์‹ค ์‚ฌ์šฉ ํ™˜๊ฒฝ์—์„œ ์กฐ๋„ ์ •๋„๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— Retinex ๊ฐœ์„ ์ด ํ•„์ˆ˜์ ์ž„์„ ์‹คํ—˜์„ ํ†ตํ•ด ๋‚˜ํƒ€๋‚ด์—ˆ๋‹ค. ๋˜ํ•œ ๊ฐœ์„  ํšจ๊ณผ๊ฐ€ ๋šœ๋ ทํ•˜์ง€๋งŒ ๋ณต์žก๋„๊ฐ€ ๋†’์€ Retinex ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ HW ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ์ตœ์ ํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. Retinex ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์—ฐ์‚ฐ์— ํ•„์ˆ˜์ ์ธ exponentiation๊ณผ Gaussian filtering์„ ํšจ์œจ์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ ๋†’์€ ํ•ด์ƒ๋„์—์„œ๋„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋™์ž‘์ด ๊ฐ€๋Šฅํ•œ HW๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค.This paper proposes a novel dominant feature pooling method utilized in the detection phase for multi-camera object detection CNNs. Multi-camera systems can capture images of objects from various perspectives and utilize more of the important features of objects for detection. Thus, the detection accuracy can be improved by pooling the features of the multiple cameras. The proposed method constructs a new feature patch by selecting and pooling the dominant features that provides more information among the feature vectors obtained from various viewpoints of objects. The proposed method is based on the YOLOv3 network for a single camera, and does not require additional learning processes for multi-camera systems. To show the effectiveness of dominant feature pooling, a novel method of visualizing feature vectors is also proposed in this work. Furthermore, a method of utilizing Retinex algorithms that can improve response to low-light environments for object detection CNN is proposed. Although improvements can be made by learning low-light images as they are, experimental results show that Retinex improvements are essential because the degree of illumination cannot be predicted accurately to create new datasets in practical environments. This work proposes a method to optimize Retinex algorithms through HW designs. An efficient implementation of the exponentiation operation and the Gaussian filtering, which are essential for Retinex algorithm operations is proposed to implement HW that can operate in real time at high resolution.์ œ 1 ์žฅ ์„œ ๋ก  1 1.1 ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ 1 1.2 ์—ฐ๊ตฌ ๋‚ด์šฉ 2 1.3 ๋…ผ๋ฌธ ๊ตฌ์„ฑ 4 ์ œ 2 ์žฅ ๋ฐฐ๊ฒฝ ์ด๋ก  ๋ฐ ๊ด€๋ จ ์—ฐ๊ตฌ 5 2.1 Object Detection CNN 5 2.2 Multi View CNN 6 2.3 Retinex ์•Œ๊ณ ๋ฆฌ์ฆ˜ 7 2.3.1 Retinex Algorithm using Gaussian Filter 8 2.3.2 Multiscale Retinex Algorithm 9 2.3.3 Efficient Naturalness Restoration 10 ์ œ 3 ์žฅ ๋ฌด์ธ ํŒ๋งค๋Œ€ ์‹œ์Šคํ…œ 12 3.1 ๋ฌด์ธ ํŒ๋งค๋Œ€ ์‹œ์Šคํ…œ ๊ฐœ์š” 12 3.2 Object Detection CNN์„ ํ™œ์šฉํ•œ ์ƒํ’ˆ ์ธ์‹ 16 3.3 Multi-Object Tracking์„ ํ™œ์šฉํ•œ ์ƒํ’ˆ ๊ตฌ๋งค ํŒ๋‹จ 18 3.4 ๋ฌด์ธ ํŒ๋งค๋Œ€์˜ ์‹ค์‹œ๊ฐ„ ๋™์ž‘์„ ์œ„ํ•œ ์ตœ์ ํ™” ๋ฐฉ์•ˆ 20 3.4.1 ์นด๋ฉ”๋ผ ์„ ํƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜ 20 3.4.2 Multithreading 24 3.4.3 Pruning 25 3.5 ๋ฌด์ธ ํŒ๋งค๋Œ€ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ ํ‰๊ฐ€ 27 3.5.1 Object Detection ์„ฑ๋Šฅ ํ‰๊ฐ€ 27 3.5.2 ๋ฌด์ธ ํŒ๋งค๋Œ€ ์‹œ์Šคํ…œ ์ „์ฒด ๊ฒฐ๊ณผ 29 ์ œ 4 ์žฅ ๋ฉ€ํ‹ฐ ์นด๋ฉ”๋ผ Dominant Feature Pooling 32 4.1 Object Detection CNN๊ณผ ๋ฉ€ํ‹ฐ ์นด๋ฉ”๋ผ Object Clustering 33 4.1.1 Object Detection CNN 33 4.1.2 ๋ฉ€ํ‹ฐ ์นด๋ฉ”๋ผ Object Clustring 35 4.2 Dominant Feature Pooling ๋ฐฉ๋ฒ• 37 4.2.1 Dominant Feature Scoring 40 4.2.2 Dominant Feature Pooling 47 4.2.3 YOLOv3์˜ Detection Layer ์žฌ์‚ฌ์šฉ 50 4.3 Feature ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•œ ์ œ์•ˆ ๋ฐฉ๋ฒ• ๋ถ„์„ 52 4.3.1 ์ œ์•ˆํ•˜๋Š” Feature ์‹œ๊ฐํ™” ๋ฐฉ๋ฒ• 52 4.3.2 ๊ธฐ์กด ๋‹จ์ผ ์นด๋ฉ”๋ผ YOLOv3์˜ Feature ์‹œ๊ฐํ™” 55 4.3.3 ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ๋ฉ€ํ‹ฐ์นด๋ฉ”๋ผ Feature ์‹œ๊ฐํ™” 57 4.4 Dominant Feature Pooling ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„ 59 4.4.1 COCO Dataset์—์„œ์˜ ๊ฒฐ๊ณผ 60 4.4.2 Custom Dataset์—์„œ์˜ ๊ฒฐ๊ณผ 62 4.4.3 Scoring Method ๋ณ„ ๊ฒฐ๊ณผ 63 4.4.3 Dominant Feature Pooling์˜ ์ˆ˜ํ–‰์‹œ๊ฐ„ ๊ฒฐ๊ณผ 64 ์ œ 5 ์žฅ Retinex Applied Object Detection ๋ฐ ํ•˜๋“œ์›จ์„œ ๊ฐ€์†์‹œ์Šคํ…œ 65 5.1 ๊ธฐ์กด Retinex ์ ์šฉ ์—ฐ๊ตฌ 66 5.2 Retinex Applied Object Detection 68 5.2.1 Retinex Applied Object Detection ํ•™์Šต 68 5.2.2 Retinex Applied Object Detection ๊ฒฐ๊ณผ 72 5.3 Object Detection์„ ์œ„ํ•œ Retinex ์ตœ์ ํ™” 76 5.3.1 Gaussian Filter ํฌ๊ธฐ์— ๋”ฐ๋ฅธ Retinex ํšจ๊ณผ ๋ถ„์„ 76 5.3.2 Gaussain Filter ํฌ๊ธฐ์— ๋”ฐ๋ฅธ Object Detection ๊ฒฐ๊ณผ 80 5.4 Retinex ํ•˜๋“œ์›จ์–ด ์‹œ์Šคํ…œ์˜ ํ•„์š”์„ฑ ๋ฐ ๊ธฐ์กด ์—ฐ๊ตฌ 82 5.5 ์ œ์•ˆ ํ•˜๋“œ์›จ์–ด ์‹œ์Šคํ…œ ๊ตฌํ˜„ ๊ฐœ์š” 85 5.6 ์ œ์•ˆ ํ•˜๋“œ์›จ์–ด ์‹œ์Šคํ…œ ๊ตฌํ˜„ ํŠน์žฅ์  89 5.6.1 Gaussian filter์˜ ๊ตฌํ˜„ 89 5.6.2 Exponentiation์˜ ๊ตฌํ˜„ 96 5.6.3 HDMI/DVI ์ง€์› ๋ฐ ์˜์ƒ latency ์ตœ์†Œํ™” 103 5.7 ์ œ์•ˆ ํ•˜๋“œ์›จ์–ด ์‹œ์Šคํ…œ ๊ตฌํ˜„ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„ 106 5.7.1 ์‹ค์‹œ๊ฐ„ ๋™์ž‘ ๋ฐ ๋‚ฎ์€ latency์— ๋Œ€ํ•œ ๋ถ„์„ 106 5.7.2 ์ œ์•ˆํ•œ ์‹œ์Šคํ…œ์˜ ์˜์ƒ ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ ๋ถ„์„ 109 5.7.3 ์ œ์•ˆํ•œ ์‹œ์Šคํ…œ์˜ FPGA Resource Utilization 112 5.7.4 ๋‹ค๋ฅธ ์‹œ์Šคํ…œ๊ณผ์˜ Resource Utilization ๋น„๊ต 114 5.7.5 ์ œ์•ˆํ•œ ์‹œ์Šคํ…œ์˜ ์˜์ƒ ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ ๋ถ„์„ 119 ์ œ 6 ์žฅ ๊ฒฐ๋ก  120 ์ฐธ๊ณ ๋ฌธํ—Œ 121 Abstract 131๋ฐ•

    A High-Throughput Hardware Accelerator for Lossless Compression of a DDR4 Command Trace

    No full text

    Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design โ€“ FMCAD 2021

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF

    Systematic Approaches for Telemedicine and Data Coordination for COVID-19 in Baja California, Mexico

    Get PDF
    Conference proceedings info: ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologies Raleigh, HI, United States, March 24-26, 2023 Pages 529-542We provide a model for systematic implementation of telemedicine within a large evaluation center for COVID-19 in the area of Baja California, Mexico. Our model is based on human-centric design factors and cross disciplinary collaborations for scalable data-driven enablement of smartphone, cellular, and video Teleconsul-tation technologies to link hospitals, clinics, and emergency medical services for point-of-care assessments of COVID testing, and for subsequent treatment and quar-antine decisions. A multidisciplinary team was rapidly created, in cooperation with different institutions, including: the Autonomous University of Baja California, the Ministry of Health, the Command, Communication and Computer Control Center of the Ministry of the State of Baja California (C4), Colleges of Medicine, and the College of Psychologists. Our objective is to provide information to the public and to evaluate COVID-19 in real time and to track, regional, municipal, and state-wide data in real time that informs supply chains and resource allocation with the anticipation of a surge in COVID-19 cases. RESUMEN Proporcionamos un modelo para la implementaciรณn sistemรกtica de la telemedicina dentro de un gran centro de evaluaciรณn de COVID-19 en el รกrea de Baja California, Mรฉxico. Nuestro modelo se basa en factores de diseรฑo centrados en el ser humano y colaboraciones interdisciplinarias para la habilitaciรณn escalable basada en datos de tecnologรญas de teleconsulta de telรฉfonos inteligentes, celulares y video para vincular hospitales, clรญnicas y servicios mรฉdicos de emergencia para evaluaciones de COVID en el punto de atenciรณn. pruebas, y para el tratamiento posterior y decisiones de cuarentena. Rรกpidamente se creรณ un equipo multidisciplinario, en cooperaciรณn con diferentes instituciones, entre ellas: la Universidad Autรณnoma de Baja California, la Secretarรญa de Salud, el Centro de Comando, Comunicaciones y Control Informรกtico. de la Secretarรญa del Estado de Baja California (C4), Facultades de Medicina y Colegio de Psicรณlogos. Nuestro objetivo es proporcionar informaciรณn al pรบblico y evaluar COVID-19 en tiempo real y rastrear datos regionales, municipales y estatales en tiempo real que informan las cadenas de suministro y la asignaciรณn de recursos con la anticipaciรณn de un aumento de COVID-19. 19 casos.ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologieshttps://doi.org/10.1007/978-981-99-3236-
    corecore