4 research outputs found

    Machine Learning and Signal Processing Design for Edge Acoustic Applications

    Get PDF

    Machine Learning and Signal Processing Design for Edge Acoustic Applications

    Get PDF

    Safety and Reliability - Safe Societies in a Changing World

    Get PDF
    The contributions cover a wide range of methodologies and application areas for safety and reliability that contribute to safe societies in a changing world. These methodologies and applications include: - foundations of risk and reliability assessment and management - mathematical methods in reliability and safety - risk assessment - risk management - system reliability - uncertainty analysis - digitalization and big data - prognostics and system health management - occupational safety - accident and incident modeling - maintenance modeling and applications - simulation for safety and reliability analysis - dynamic risk and barrier management - organizational factors and safety culture - human factors and human reliability - resilience engineering - structural reliability - natural hazards - security - economic analysis in risk managemen

    Energy-Efficient Hardware Accelerators for Compressed Neural Networks

    No full text
    Doctor์ธ๊ณต์ง€๋Šฅ์€ ์ธ๊ฐ„์˜ ๋‘๋‡Œ ํ™œ๋™์„ ๋ชจ๋ฐฉํ•จ์œผ๋กœ์จ ์ปดํ“จํ„ฐ ํ”„๋กœ๊ทธ๋žจ์ƒ์— ์ธ๊ฐ„์˜ ์ธ์ง€๋Šฅ๋ ฅ์„ ๊ตฌํ˜„ํ•˜๋Š” ์˜์—ญ์ด๋‹ค. ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ธ๊ณต์ง€๋Šฅ์˜ ๊ตฌํ˜„ ๋ฐฉ๋ฒ• ์ค‘ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ (deep neural network, DNN)๋“ค์€ ์ปดํ“จํ„ฐ ์˜์ƒ์ฒ˜๋ฆฌ, ์Œ์„ฑ์ธ์‹, ๋ฒˆ์—ญ ๋“ฑ์„ ๋น„๋กฏํ•œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์‘์šฉ ๋ถ„์•ผ์—์„œ ์ „๋ก€๊ฐ€ ์—†๋Š” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋•๋ถ„์— ๋งŽ์€ ์ธ๊ธฐ๋ฅผ ์–ป๊ณ  ์žˆ๋‹ค. DNN๋“ค์€ ๋ณดํ†ต ์ˆ˜์‹ญ์—์„œ ์ˆ˜๋ฐฑ ๊ฐœ ์ด์ƒ์˜ ์‹ ๊ฒฝ๋ง ๊ณ„์ธต (layer)๋“ค๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์ด๋Ÿฌํ•œ ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ์—ฐ์‚ฐ์€ ์ƒ๋‹นํžˆ ๋งŽ์€ ์ˆ˜์˜ ๊ณฑ์…ˆ-๋ง์…ˆ (multiply-accumulate, MAC) ๋™์ž‘๋“ค์ด ์ฃผ๋ฅผ ์ด๋ฃฌ๋‹ค. MAC ์—ฐ์‚ฐ์„ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด, ๋ช‡๋ช‡ ์ „์šฉ ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ๋“ค์ด ์†Œ๊ฐœ๋๋‹ค. ๊ธฐ๋ณธ ๊ตฌ์กฐ๋Š” systolic array ๊ธฐ๋ฐ˜์˜ MAC array์ด๋ฉฐ, ์ด๋Ÿฌํ•œ MAC array๋Š” ๋™์‹œ์— ๋งŽ์€ ์ž…๋ ฅ๊ฐ’ (input)๋“ค๊ณผ ๊ฐ€์ค‘์น˜ (weight)๋“ค์„ ์ด์šฉํ•œ ํ–‰๋ ฌ (matrix-matrix) ๊ณฑ์…ˆ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ตœ๊ทผ DNN ๋ชจ๋ธ๋“ค์€ ๋งค์šฐ ๋†’์€ ๊ณ„์‚ฐ ๋ณต์žก๋„ (25G ๋ฒˆ์˜ ๋™์ž‘๋“ค)์™€ ๋งค์šฐ ํฐ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ (150M ๋ฐ์ดํ„ฐ๋“ค)์„ ๊ฐ–๊ธฐ ๋•Œ๋ฌธ์—, ํ•˜๋“œ์›จ์–ด์—์„œ ์ด๋ ‡๊ฒŒ ๋ถ€๋‹ด์Šค๋Ÿฌ์šด ๋ชจ๋ธ๋“ค์„ ์—ฐ์‚ฐํ•˜๋Š” ๊ฒƒ์ด ์—ฌ์ „ํžˆ ์–ด๋ ค์šด ๋ฌธ์ œ๋กœ ๋‚จ์•„์žˆ๋‹ค. ๊ทธ๋ฆฌํ•˜์—ฌ ๋งŽ์€ ์—ฐ๊ตฌ์ž๋“ค์ด DNN ๋ชจ๋ธ๋“ค์˜ ๊ณ„์‚ฐ๋Ÿ‰ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๋Š” ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•ด์˜ค๊ณ  ์žˆ๋‹ค. ์ž˜ ์•Œ๋ ค์ง„ ๋ฐฉ๋ฒ•๋“ค ์ค‘์˜ ํ•˜๋‚˜๋Š” DNN์— ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ๋“ค์„ ๋‚ฎ์€ ๋น„ํŠธ ์ •๋ฐ€๋„ (bit precision)๋กœ์จ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋‚ฎ์€ ๋น„ํŠธ ์ •๋ฐ€๋„๋ฅผ ์‚ฌ์šฉํ• ์ˆ˜๋ก, ํ•˜๋“œ์›จ์–ด๋กœ MAC ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ์‚ฐ ๋ณต์žก๋„๊ฐ€ 2์ฐจ ์‹์œผ๋กœ (quadratically) ์ค„์–ด๋“ค๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์€ 1์ฐจ ์‹์œผ๋กœ (linearly) ์ค„์–ด๋“ ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋งŽ์€ MAC ์—ฐ์‚ฐ ํšŸ์ˆ˜์— ๋Œ€ํ•œ ๋ถ€๋‹ด์„ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•œํŽธ, DNN ์—ฐ์‚ฐ์—์„œ ์ตœ์ ์˜ ๋น„ํŠธ ์ •๋ฐ€๋„๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋“ค์ด ์†Œ๊ฐœ๋๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ๋Œ€๋ถ€๋ถ„์˜ network๋“ค์—์„œ layer์— ๋”ฐ๋ผ ์„œ๋กœ ๋‹ค๋ฅธ ๋‹ค์–‘ํ•œ ๋น„ํŠธ ์ •๋ฐ€๋„๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ์‚ฌ์‹ค์ด ์ž˜ ์•Œ๋ ค์ ธ์˜ค๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์˜ ์—ฌ๋Ÿฌ quantized neural network๋“ค์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ์ •๋ฐ€๋„๋ฅผ ๊ฐ€๋ณ€ํ•  ์ˆ˜ ์žˆ๋Š” (precision-scalable) ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ๋“ค์ด ์†Œ๊ฐœ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๊ธฐ์กด์— ์†Œ๊ฐœ๋œ ๊ฐ€๋ณ€ ์ •๋ฐ€๋„ ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•œ๊ณ„์ ์„ ๊ฐ–๋Š”๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณฑ์…ˆ๊ธฐ์˜ ์ด์šฉ๋ฅ  (utilization)์ด ๊ฐ์†Œํ•˜๊ฑฐ๋‚˜, ์ •๋ฐ€๋„ ๊ฐ€๋ณ€์ด ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•ด์„œ๋งŒ ์ง€์›๋˜๊ฑฐ๋‚˜, ์ •๋ฐ€๋„ ๊ฐ€๋ณ€์„ ์œ„ํ•œ ํšŒ๋กœ๊ฐ€ ์นฉ ๋ฉด์ ์„ ๋งŽ์ด ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. DNN ๋ชจ๋ธ๋“ค์˜ ๋ณต์žก๋„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ž˜ ์•Œ๋ ค์ง„ ๋˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์€ ๋น„๊ต์  ๋œ ์ค‘์š”ํ•œ ๊ฐ€์ค‘์น˜ (weight) ์—ฐ๊ฒฐ๋“ค์„ ์‚ญ์ œํ•˜๋Š” ๊ฐ€์ค‘์น˜ pruning ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ฐ€์ค‘์น˜ pruning ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•  ๋•Œ AlexNet๊ณผ VGG-16 ๋„คํŠธ์›Œํฌ์—์„œ ๊ฐ๊ฐ 9๋ฐฐ, 13๋ฐฐ์˜ ๊ฐ€์ค‘์น˜ ์—ฐ๊ฒฐ๋„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ sparse neural network๋“ค์„ dense ํ–‰๋ ฌ ์—ฐ์‚ฐ์— ์ตœ์ ํ™”๋œ ๊ธฐ์กด ํ•˜๋“œ์›จ์–ด์—์„œ ์—ฐ์‚ฐํ•œ๋‹ค๋ฉด, ๋งŽ์€ ๊ณฑ์…ˆ๊ธฐ์—์„œ ์˜๋ฏธ ์—†๋Š” `0'์— ์˜ํ•œ ๊ณฑ์…ˆ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์‹ค์งˆ์ ์ธ ํฐ ์„ฑ๋Šฅํ–ฅ์ƒ ํšจ๊ณผ๋Š” ์ผ์–ด๋‚˜์ง€ ์•Š๋Š”๋‹ค. Sparse ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ๋“ค์„ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, compressed-sparse row (CSR), compressed-sparse column (CSC), bit-vector encoding, run-length encoding (RLE)์„ ๋น„๋กฏํ•œ ์—ฌ๋Ÿฌ ์ธ์ฝ”๋”ฉ ๋ฐฉ๋ฒ•๋“ค์ด ์†Œ๊ฐœ๋๋‹ค. ์ด ๋ฐฉ๋ฒ•๋“ค์€ `0' ๊ฐ’๋“ค์„ ์ตœ๋Œ€ํ•œ ์••์ถ•ํ•จ์œผ๋กœ์จ DNN ์—ฐ์‚ฐ์„ ์œ„ํ•ด ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋“ค์˜ ์ˆ˜๋ฅผ ๋งŽ์ด ๊ฐ์†Œ์‹œํ‚ค์ง€๋งŒ, ํ–‰๋ ฌ ์ƒ์—์„œ `0'๊ฐ’๋“ค์ด ์ผ๋ฐ˜์ ์œผ๋กœ ๋ถˆ๊ทœ์น™ํ•œ ๋ถ„ํฌ๋กœ ์ €์žฅ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์„ฑ๋Šฅํ–ฅ์ƒ์ด ์ œํ•œ์ ์ด๋‹ค. ์ด๋Ÿฌํ•œ sparse neural network๋“ค์˜ ์ถ”๋ก  ์ž‘์—…์„ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ „์šฉ ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ๋“ค์ด ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ sparse neural network ๊ฐ€์†๊ธฐ์—์„œ๋Š” `0'์ด ์•„๋‹Œ ์ž…๋ ฅ๋“ค๊ณผ ๊ฐ€์ค‘์น˜๋“ค ์ค‘์— ๊ฐ™์€ ์ฑ„๋„ ์ธ๋ฑ์Šค๋ฅผ ๊ฐ–๋Š” ๊ฐ’๋“ค์„ ์ง์œผ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ index-matching ํšŒ๋กœ๊ฐ€ ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๊ธฐ์กด sparse neural network ๊ฐ€์†๊ธฐ๋“ค์˜ ๊ฒฝ์šฐ, index-matching ๋‹จ๊ณ„์—์„œ ๊ณฑ์…ˆ๊ธฐ๋“ค์˜ ์ด์šฉ๋ฅ ์ด ๊ฐ์†Œํ•˜๊ฑฐ๋‚˜, ๋ฉ”๋ชจ๋ฆฌ์™€ ๊ณฑ์…ˆ๊ธฐ ์‚ฌ์ด์˜ ๊ฒฝ๋กœ์—์„œ ๋ฐ์ดํ„ฐ ๋ฉˆ์ถค (stall) ํ˜„์ƒ์ด ๋นˆ๋ฒˆํžˆ ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ด๋กœ ์ธํ•ด ์ „์ฒด ์‹œ์Šคํ…œ์˜ ์—ฐ์‚ฐ ์ฒ˜๋ฆฌ ํšจ์œจ (throughput)์ด ํฌ๊ฒŒ ๋–จ์–ด์ง€๋Š” ๋‹จ์ ์„ ๊ฐ–๋Š”๋‹ค. ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ๋ฐฐ๊ฒฝ์„ ๋ฐ”ํƒ•์œผ๋กœ, ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‘ ๊ฐ€์ง€ ๋Œ€์ค‘์ ์ธ ์••์ถ•๋œ (compressed) ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ (quantized neural network ๋ฐ sparse neural network) ๋“ค์˜ ๋ฉด์  ๋ฐ ์—๋„ˆ์ง€ ํšจ์œจ์ด ๋†’์€ ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋จผ์ €, bitwise summation ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ, ์ด๋ฅผ ์ด์šฉํ•ด ์„ค๊ณ„ํ•œ BitBlade ๊ฐ€์†๊ธฐ ๊ตฌ์กฐ๋ฅผ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์ž…๋ ฅ๊ฐ’๋“ค๊ณผ ๊ฐ€์ค‘์น˜ ๊ฐ’๋“ค์„ ๋ฉ”๋ชจ๋ฆฌ๋กœ๋ถ€ํ„ฐ ํšจ์œจ์ ์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•ด channel-wise aligning ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ƒ์šฉ 28nm CMOS ๊ณต์ •์„ ์ด์šฉํ•ด ํ…Œ์ŠคํŠธ ์นฉ์„ ์ œ์ž‘ํ•˜์˜€์œผ๋ฉฐ, SRAM ๋ฒ„ํผ์˜ ์šฉ๋Ÿ‰๊ณผ ์—ฐ์‚ฐ๊ธฐ์˜ ๊ฐœ์ˆ˜ ์‚ฌ์ด์— ๊ท ํ˜•์„ ๋งž์ถค์œผ๋กœ์จ ์นฉ๊ณผ DRAM ๊ฐ„์˜ ํ†ต์‹  ํšŸ์ˆ˜๋ฅผ ์ค„์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๊ณ ์„ฑ๋Šฅ์˜ sparse neural network ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ SPRITE๋ฅผ ์†Œ๊ฐœํ•˜์˜€๋‹ค. SPRITE ๊ฐ€์†๊ธฐ ๊ตฌ์กฐ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์€ index-matching ๋น„๊ต๊ธฐ๋“ค์ด matrix ๋ฐ€๋„์— ํฌ๊ฒŒ ๊ด€๊ณ„์—†์ด ์ผ์ •ํ•œ ๋งค์นญ ํ™•๋ฅ ์„ ๊ฐ–๋„๋ก ํ•œ ๊ฒƒ์ด๋‹ค. ๋ง๋ถ™์—ฌ, DNN ํ•˜๋“œ์›จ์–ด์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” MAC ์—ฐ์‚ฐ๊ธฐ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์นฉ์˜ ๋งŽ์€ ๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•˜๋Š” ๋™์‹œ์— timing-critical path๋“ค์„ ์ฐจ์ง€ํ•˜๋Š” ์ด์œ ๋กœ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๊ฒฐ์ •ํ•˜๋Š” ์š”์†Œ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด๋Ÿฌํ•œ MAC ์—ฐ์‚ฐ๊ธฐ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด, DNN ์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ ํ•„์š”ํ•œ ๋งŽ์€ ์ˆ˜์˜ ๊ฑฐ๋“ญ๋˜๋Š” MAC ์—ฐ์‚ฐ ์ค‘ ์ตœ์ข… MAC ์—ฐ์‚ฐ ๊ฒฐ๊ณผ๋งŒ DNN ์—ฐ์‚ฐ์— ์‹ค์งˆ์ ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ํŠน์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, feedforward-cutset-free (FCF) pipelining ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ DNN ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ์˜ ์นฉ ๋ฉด์  ๋ฐ ์‚ฌ์šฉ ์ „๋ ฅ์„ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค. ์ด๋ ‡๊ฒŒ ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•˜๋“œ์›จ์–ด์—์„œ์˜ DNN ์—ฐ์‚ฐ์„ ๊ณ ์„ฑ๋Šฅ๊ณผ ๋†’์€ ์—๋„ˆ์ง€ ํšจ์œจ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋“ค์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค. ์ตœ๊ทผ์—, ์Šค๋งˆํŠธํฐ, ์Šค๋งˆํŠธ ์›Œ์น˜, ์ธ๊ณต์ง€๋Šฅ ์Šคํ”ผ์ปค๋ฅผ ๋น„๋กฏํ•œ ์—ฌ๋Ÿฌ edge/IoT ์žฅ์น˜๋“ค์—์„œ DNN ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ํ•˜๋Š” ์ˆ˜์š”๊ฐ€ ๊ธ‰๊ฒฉํžˆ ๋Š˜์–ด๋‚˜๊ณ  ์žˆ๋Š”๋ฐ, ์ œ์•ˆํ•œ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋“ค์„ ํ†ตํ•ด ์ด๋Ÿฌํ•œ DNN ์—ฐ์‚ฐ์„ ์ „์šฉ ํ•˜๋“œ์›จ์–ด์—์„œ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•˜๋Š”๋ฐ ์ด๋ฐ”์ง€ํ•จ์„ ๊ธฐ๋Œ€ํ•˜๊ณ  ์žˆ๋‹ค.In this dissertation, we propose design methodologies to efficiently handle two popular types of compressed neural networks (quantized/sparse neural networks) on custom hardware. While such compressed neural networks reduce both the memory footprint and the computational complexity, computing the compressed neural networks on the conventional hardware accelerators leads to the performance degradation. To overcome such a limitation, we first propose the BitBlade, a compact precision-scalable neural network hardware accelerator. While previous scalable accelerators require a large chip area to support dynamic precision scaling, our BitBlade architecture achieves the scalability with smaller area by introducing a bitwise summation scheme. The BitBlade chip was fabricated in a 28nm CMOS technology and the throughput and the system-level energy-efficiency were improved by up to 7.7x and 1.64x, respectively, compared to the state-of-the-art precision-scalable accelerators. In addition, we propose the SPRITE, a high-performance accelerator for sparse neural networks. Previous sparsity-aware accelerators experience the under-utilization or stalls in a multiplication stage due to the irregular indices of non-zero values. On the other hand, our SPRITE architecture achieves the high utilization of multiply-accumulate units over a wide range of sparsity thanks to the constant probability of channel index-matching. The SPRITE improves the system performance by up to 6.1x compared to previous sparsity-aware neural processing units with comparable overall energy-efficiency. Before introducing our accelerators for the compressed neural networks, we also present a feedforward-cutset-free pipelined multiply-accumulate unit tailored to machine learning hardware accelerators
    corecore