4 research outputs found

    On-chip memory reduction in CNN hardware design for image super-resolution

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์ดํ˜์žฌ.Single image super-resolution (SISR) ์„ ์œ„ํ•œ convolutional neural network (CNN) ๋Š” ์˜์ƒ ๋ถ„๋ฅ˜์šฉ CNN๊ณผ ๋‹ฌ๋ฆฌ ๊ณ ํ•ด์ƒ๋„์˜ ์˜์ƒ์„ ์ž…๋ ฅ ๋ฐ›์•„ ๊ณ ํ•ด์ƒ๋„์˜ ์ค‘๊ฐ„ ์—ฐ์‚ฐ ๊ฒฐ๊ณผ์ธ feature map์„ ์ƒ์„ฑ ํ•œ๋‹ค. SISR์šฉ CNN์„ ๊ฐ€์†ํ•˜๊ธฐ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด๋Š” ์ฃผ๋กœ ๋””์Šคํ”Œ๋ ˆ์ด ์žฅ์น˜์— ์ ์šฉ์ด ๋˜๋ฉฐ ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ์ŠคํŠธ๋ฆฌ๋ฐ ๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š”๋‹ค. ์ด๋Š” on-chip ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ œํ•œ์ ์ธ ํ•˜๋“œ์›จ์–ด์˜ ํŠน์„ฑ์ƒ ๊ตฌํ˜„์˜ ์–ด๋ ค์›€์„ ์•ผ๊ธฐํ•œ๋‹ค. ๊ธฐ์กด์˜ ์—ฐ๊ตฌ๋“ค์€ on-chip ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฐ์†Œํ•˜๊ธฐ ์œ„ํ•ด ์„ฑ๋Šฅ ์ €ํ•˜ ๋˜๋Š” ์••์ถ• ๋ชจ๋“ˆ์„ ์ถ”๊ฐ€ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์„ฑ๋Šฅ ์ €ํ•˜ ์—†์ด SISR์šฉ CNN ํ•˜๋“œ์›จ์–ด์˜ on-chip ๋ฉ”๋ชจ๋ฆฌ ๊ฐ์†Œ ๋ฐ ํ•˜๋“œ์›จ์–ด๋ฅผ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. CNN ํ•˜๋“œ์›จ์–ด๋Š” VDSR (Very deep neural network for super-resolution) ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ๊ธฐ์กด CNN ํ•˜๋“œ์›จ์–ด์˜ SRAM์— ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ ์ ‘๊ทผ์ด ๋™์‹œ์— ๋ฐœ์ƒํ•˜๋Š” ๋ž˜์Šคํ„ฐ ์Šค์บ” ์ˆœ์„œ๋ฅผ ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ๋กœ ๋ณ€๊ฒฝ ํ•จ์œผ๋กœ ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ ์ ‘๊ทผ ํƒ€์ด๋ฐ์„ ๋ถ„๋ฆฌํ•œ๋‹ค. ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ๋Š” ๊ธฐ์กด์˜ CNN ํ•˜๋“œ์›จ์–ด๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ๋“€์–ผ ํฌํŠธ SRAM ๋Œ€์‹  ์‹ฑ๊ธ€ ํฌํŠธ SRAM์„ ์‚ฌ์šฉํ•˜๋„๋ก ํ•˜๋ฉฐ ์ด๋Š” on-chip ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ๊ฐ์†Œํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์œผ๋กœ VDSR์˜ ํ•„ํ„ฐ์˜ ํ˜•ํƒœ๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•œ๋‹ค. On-chip ๋ฉ”๋ชจ๋ฆฌ์˜ ํฌ๊ธฐ๋Š” ์ปจ๋ณผ๋ฃจ์…˜ ํ•„ํ„ฐ์˜ ๋†’์ด์— ๋น„๋ก€ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ VDSR์˜ ํ•„ํ„ฐ๋Š” ๋Œ€์นญ ๊ตฌ์กฐ ์ค‘ ๊ฐ€์žฅ ์ž‘์€ ํ•„ํ„ฐ ๋ชจ์–‘์ด๋ฏ€๋กœ ํ•ด๋‹น ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ปจํ…์ŠคํŠธ ๋ณด์กด 1D ํ•„ํ„ฐ ๊ตฌ์„ฑ ๋ฐฉ๋ฒ• ๋ฐ ์ปจํ…์ŠคํŠธ๋ฅผ ๊ธฐ๋ฐ˜ํ•œ ์„ธ๋กœ ํ•„ํ„ฐ ๊ฐ์†Œ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ SRAM์˜ ํฌ๊ธฐ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ถ”๊ฐ€์ ์œผ๋กœ ๊ฐ์†Œํ•œ๋‹ค. CNN ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ๊ฐ€ ํ™•์ • ๋œ ์ดํ›„ CNN์˜ SISR ์„ฑ๋Šฅ์„ ๊ฐœ์„  ํ•˜๊ธฐ ์œ„ํ•œ CNNํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ž์—ฐ ์˜์ƒ (natural image)์™€ ํ…์ŠคํŠธ ์˜์ƒ (text image)์— ๋Œ€ํ•ด ๊ฐ๊ฐ ์ œ์•ˆํ•œ๋‹ค. SRGAN (Super-resolution generative adversarial networks) ๋Š” ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ (discriminator network)๋กœ๋ถ€ํ„ฐ ๋ฐœ์ƒํ•˜๋Š” ์†์‹ค์œผ๋กœ SISR์šฉ CNN์ด ์‹ค์ œ ์˜์ƒ์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ์ž์—ฐ ์˜์ƒ์„ ์ถœ๋ ฅํ•˜๋„๋ก ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ SRGAN์€ ๊ณผ์„ ๋ช…ํ™”๋กœ ์ธํ•œ ์‹œ๊ฐ์  ๊ฒฐํ•จ์„ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ SRGAN์˜ ์‹œ๊ฐ์  ๊ฒฐํ•จ์„ ์ œ๊ฑฐํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ ๋‚ด์—์„œ ์˜์ƒ์˜ ์„ธ๋ถ€ ์ •๋ณด ์†์‹ค์„ ๋ฐฉ์ง€ํ•˜๋Š” ํ•ด์ƒ๋„ ์œ ์ง€ ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆ ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ์ฝ˜ํ…ํŠธ ์†์‹ค์„ ๋ฐœ์ƒํ•˜๋Š” VGG ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ์ƒ ์˜์ƒ์˜ ์„ธ๋ถ€์ ์ธ ์ •๋ณด๋ฅผ ์†์‹คํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•ด์ƒ๋„ ์œ ์ง€ ์ฝ˜ํ…ํŠธ ์†์‹ค ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ํ…์ŠคํŠธ ์˜์ƒ์€ ์ž์—ฐ ์˜์ƒ์ด ์•„๋‹Œ ํ•ฉ์„ฑ ์˜์ƒ์œผ๋กœ ์˜์ƒ ๋‚ด ํฐํŠธ์™€ ๋ฐฐ๊ฒฝ์˜ ์ƒ‰์ƒ ์กฐํ•ฉ์„ ๋‹ค์–‘ํ•˜๊ฒŒ ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ๋‹ค. ๊ธฐ์กด์˜ CNN ํ•™์Šต ๋ฐฉ๋ฒ•์€ ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ์˜์ƒ์„ ํ•™์Šต ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ชจ๋“  ์ข…๋ฅ˜์˜ ์ƒ‰์ƒ ์กฐํ•ฉ์„ CNN์— ํ•™์Šต ์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์˜์ƒ ์••์ถ•์— ์‚ฌ์šฉ๋˜๋Š” De-colorization ๋ฐฉ๋ฒ•์„ ์ฐจ์šฉํ•˜์—ฌ CNN์ด ํ•™์Šตํ•  ์˜์ƒ์„ ๊ฒ€์€ ํฐํŠธ์™€ ํฐ์ƒ‰ ๋ฐฐ๊ฒฝ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ์˜์ƒ์œผ๋กœ ํ•œ์ • ํ•จ์œผ๋กœ ํ•™์Šต๋˜์ง€ ์•Š์€ ์˜์ƒ์˜ ํฐํŠธ ๋ฐ ๋ฐฐ๊ฒฝ ์ƒ‰์ƒ ์กฐํ•ฉ์—๋„ ์‹œ๊ฐ์  ๊ฒฐํ•จ ์—†์ด SISR ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆ ํ•œ๋‹ค.Unlike convolutional neural network (CNN) for image classification, CNN for single image super-resolution (SISR) receives high-resolution image and generates feature maps which are high-resolution intermediate results. The hardware for accelerating the CNN for SISR is mainly applied to the display device, and the CNN hardware has a streaming architecture in which external memory access is impossible. This causes implementation difficulties due to the limited hardware capacity of the on-chip memory. This paper proposes two methods for designing CNN hardware for SISR using limited hardware resources. CNN hardware is based on a very deep neural network for super-resolution (VDSR) architecture. By using the partially-vertical order for the convolution layers, simultaneous read and write accesses to SRAM are prevented. The proposed order makes CNN use single-port SRAM instead of dual-port SRAM, and it reduces on-chip memory area by half. The second method is to change the shape of the filter in VDSR. The size of the on-chip memory is proportional to the height of the convolution filter. However, since the filter of VDSR is the smallest of the symmetric shape, it is impossible to reduce the filter height of the VDSR. To solve this problem, a method of constructing a context-preserving 1D filter and a method of decreasing a vertical filter based on the context are proposed. These proposed methods reduce the size of the SRAM in half. Two CNN training methods for SISR of natural image and that of text image are proposed. These methods improve SISR performance after the CNN hardware architecture is confirmed. SRGAN (super-resolution generative adversarial networks) is trained by the help of discriminator network to generate realistic natural images. However, SRGAN has the problem of causing visual defects due to over-sharpening. This paper proposes two methods to eliminate the visual defects of SRGAN. First, the resolution-preserving discriminator network structure is proposed. This discriminator network prevents detailed information loss in the network by changing the structure of it. Second, the resolution-preserving content loss is proposed to solve the problem of loss of detailed information of image due to the structure of VGG19 network that causes content loss. The text image is not a natural image but a synthetic image. The color combination of the font and the background in the image can be variously changed. The existing CNN learning method uses a method of learning various kinds of images to generalize the network. However, it is impossible to learn all kinds of color combinations on CNN. This paper uses the de-colorization method used in image compression to limit the image to be learned by CNN to a black font and a white background image. As a result, CNN performs SISR operation without visual flaws in the font and background color combination image of the trained image.์ œ 1 ์žฅ ์„œ ๋ก  1 1.1 ์—ฐ๊ตฌ์˜ ๋ฐฐ๊ฒฝ 1 1.2 ์—ฐ๊ตฌ์˜ ๋‚ด์šฉ 5 1.3 ๋…ผ๋ฌธ์˜ ๊ตฌ์„ฑ 8 ์ œ 2 ์žฅ ์ด์ „ ์—ฐ๊ตฌ 9 2.1 SISR CNN ์•Œ๊ณ ๋ฆฌ์ฆ˜ 9 2.2 ์ŠคํŠธ๋ฆฌ๋ฐ ๊ตฌ์กฐ์˜ SISR ํ•˜๋“œ์›จ์–ด 14 2.3 ๊ธฐ์กด CNN ํ•˜๋“œ์›จ์–ด์˜ on-chip ๋ฉ”๋ชจ๋ฆฌ ๊ฐ์†Œ ๋ฐฉ๋ฒ• 15 2.4 De-colorization 17 ์ œ 3 ์žฅ ์ปจ๋ณผ๋ฃจ์…˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ SRAM ๋ฉด์  ๊ฐ์†Œ๋ฅผ ์œ„ํ•œ ์—ฐ์‚ฐ ์ˆœ์„œ ๋ณ€๊ฒฝ 20 3.1 ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ ์ปจ๋ณผ๋ฃจ์…˜ ์—ฐ์‚ฐ 20 3.2 ifmap์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ ˆ์ง€์Šคํ„ฐ 24 3.3 CNN์˜ ์ฒซ ๋ฒˆ์งธ ๋ฐ ๋งˆ์ง€๋ง‰ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด SRAM ๊ตฌ์„ฑ 26 3.4 fmap์˜ SRAM ๋‹ค์ฑ„๋„ ๊ณต์œ ๋ฅผ ์œ„ํ•œ ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ 28 3.5 ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ์˜ ์ ์šฉ ๊ฐ€๋Šฅ CNN ๊ตฌ์กฐ 33 3.5 ์‹คํ—˜ ๊ฒฐ๊ณผ 36 ์ œ 4 ์žฅ ์˜์ƒ์˜ ์ปจํ…์ŠคํŠธ ๋ณด์กด์„ ์œ„ํ•œ ํ•„ํ„ฐ ์žฌ๊ตฌ์„ฑ ๋ฐ CNN ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„ 42 4.1 SRAM ๊ฐ์†Œ๋ฅผ ์œ„ํ•œ ์ œ์•ˆ ์•Œ๊ณ ๋ฆฌ์ฆ˜ 43 4.2 SISR์šฉ CNN ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ 49 4.3 ์‹คํ—˜ ๊ฒฐ๊ณผ 55 ์ œ 5 ์žฅ SISR์„ ์œ„ํ•œ ํ•ด์ƒ๋„ ๋ณด์กด ์ƒ์‚ฐ์  ์ ๋Œ€ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ 64 5.1 ํ•ด์ƒ๋„ ๋ณด์กด ํŒ๋ณ„ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ 64 5.2 ํ•ด์ƒ๋„ ๋ณด์กด ์ฝ˜ํ…ํŠธ ์†์‹ค 68 5.3 ์‹คํ—˜ ๊ฒฐ๊ณผ 70 ์ œ 6 ์žฅ De-colorization์„ ์ ์šฉํ•œ text SISR 84 6.1 Text de-colorization์„ ์ ์šฉํ•œ CNN ํ•™์Šต 84 6.2 ์‹คํ—˜ ๊ฒฐ๊ณผ 86 ์ œ 7 ์žฅ ๊ฒฐ๋ก  95 ์ฐธ๊ณ ๋ฌธํ—Œ 98 Abstract 105Docto
    corecore