24 research outputs found

    Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA

    Get PDF
    Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such as image segmentation and super resolution. However, the deconvolution algorithms are computationally intensive which limits their applicability to real time applications. Particularly, there has been little research on the efficient implementations of deconvolution algorithms on FPGA platforms which have been widely used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other optimization techniques. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space in order to achieve optimal processing speed of the system and improve power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device. Finally, we implement our designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 GOPS under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization, and supports up to 17 frames per second for 512x512 image inputs with a power consumption of only 9.6W

    FPGA-based systolic deconvolution architecture for upsampling

    Get PDF
    A deconvolution accelerator is proposed to upsample n ร— n input to 2n ร— 2n output by convolving with a k ร— k kernel. Its architecture avoids the need for insertion and padding of zeros and thus eliminates the redundant computations to achieve high resource efficiency with reduced number of multipliers and adders. The architecture is systolic and governed by a reference clock, enabling the sequential placement of the module to represent a pipelined decoder framework. The proposed accelerator is implemented on a Xilinx XC7Z020 platform, and achieves a performance of 3.641 giga operations per second (GOPS) with resource efficiency of 0.135 GOPS/DSP for upsampling 32 ร— 32 input to 256 ร— 256 output using a 3 ร— 3 kernel at 200 MHz. Furthermore, its high peak signal to noise ratio of almost 80 dB illustrates that the upsampled outputs of the bit truncated accelerator are comparable to IEEE double precision results

    Simulation Studies of Digital Filters for the Phase-II Upgrade of the Liquid-Argon Calorimeters of the ATLAS Detector at the High-Luminosity LHC

    Get PDF
    Am Large Hadron Collider und am ATLAS-Detektor werden umfangreiche Aufrรผstungsarbeiten vorgenommen. Diese Arbeiten sind in mehrere Phasen gegliedert und umfassen unter Anderem ร„nderungen an der Ausleseelektronik der Flรผssigargonkalorimeter; insbesondere ist es geplant, wรคhrend der letzten Phase ihren Primรคrpfad vollstรคndig auszutauschen. Die Elektronik besteht aus einem analogen und einem digitalen Teil: wรคhrend ersterer die Signalpulse verstรคrkt und sie zur leichteren Abtastung verformt, fรผhrt letzterer einen Algorithmus zur Energierekonstruktion aus. Beide Teile mรผssen wรคhrend der Aufrรผstung verbessert werden, damit der Detektor interessante Kollisionsereignisse prรคzise rekonstruieren und uninteressante effizient verwerfen kann. In dieser Dissertation werden Simulationsstudien prรคsentiert, die sowohl die analoge als auch die digitale Auslese der Flรผssigargonkalorimeter optimieren. Die Korrektheit der Simulation wird mithilfe von Kalibrationsdaten geprรผft, die im sog. Run 2 des ATLAS-Detektors aufgenommen worden sind. Der Einfluss verschiedener Parameter der Signalverformung auf die Energieauflรถsung wird analysiert und die Nรผtzlichkeit einer erhรถhten Abtastrate von 80 MHz untersucht. Des Weiteren gibt diese Arbeit eine รœbersicht รผber lineare und nichtlineare Energierekonstruktionsalgorithmen. SchlieรŸlich wird eine Auswahl von ihnen hinsichtlich ihrer Leistungsfรคhigkeit miteinander verglichen. Es wird gezeigt, dass ein Erhรถhen der Ordnung des Optimalfilters, der gegenwรคrtig verwendete Algorithmus, die Energieauflรถsung um 2 bis 3 % verbessern kann, und zwar in allen Regionen des Detektors. Der Wiener Filter mit Vorwรคrtskorrektur, ein nichtlinearer Algorithmus, verbessert sie um bis zu 10 % in einigen Regionen, verschlechtert sie aber in anderen. Ein Zusammenhang dieses Verhaltens mit der Wahrscheinlichkeit fรคlschlich detektierter Kalorimetertreffer wird aufgezeigt und mรถgliche Lรถsungen werden diskutiert.:1 Introduction 2 An Overview of High-Energy Particle Physics 2.1 The Standard Model of Particle Physics 2.2 Verification of the Standard Model 2.3 Beyond the Standard Model 3 LHC, ATLAS, and the Liquid-Argon Calorimeters 3.1 The Large Hadron Collider 3.2 The ATLAS Detector 3.3 The ATLAS Liquid-Argon Calorimeters 4 Upgrades to the ATLAS Liquid-Argon Calorimeters 4.1 Physics Goals 4.2 Phase-I Upgrade 4.3 Phase-II Upgrade 5 Noise Suppression With Digital Filters 5.1 Terminology 5.2 Digital Filters 5.3 Wiener Filter 5.4 Matched Wiener Filter 5.5 Matched Wiener Filter Without Bias 5.6 Timing Reconstruction, Optimal Filtering, and Selection Criteria 5.7 Forward Correction 5.8 Sparse Signal Restoration 5.9 Artificial Neural Networks 6 Simulation of the ATLAS Liquid-Argon Calorimeter Readout Electronics 6.1 AREUS 6.2 Hit Generation and Sampling 6.3 Pulse Shapes 6.4 Thermal Noise 6.5 Quantization 6.6 Digital Filters 6.7 Statistical Analysis 7 Results of the Readout Electronics Simulation Studies 7.1 Statistical Treatment 7.2 Simulation Verification Using Run-2 Data 7.3 Dependence of the Noise on the Shaping Time 7.4 The Analog Readout Electronics and the ADC 7.5 The Optimal Filter (OF) 7.6 The Wiener Filter 7.7 The Wiener Filter with Forward Correction (WFFC) 7.8 Final Comparison and Conclusions 8 Conclusions and Outlook AppendicesThe Large Hadron Collider and the ATLAS detector are undergoing a comprehensive upgrade split into multiple phases. This effort also affects the liquid-argon calorimeters, whose main readout electronics will be replaced completely during the final phase. The electronics consist of an analog and a digital portion: the former amplifies the signal and shapes it to facilitate sampling, the latter executes an energy reconstruction algorithm. Both must be improved during the upgrade so that the detector may accurately reconstruct interesting collision events and efficiently suppress uninteresting ones. In this thesis, simulation studies are presented that optimize both the analog and the digital readout of the liquid-argon calorimeters. The simulation is verified using calibration data that has been measured during Run 2 of the ATLAS detector. The influence of several parameters of the analog shaping stage on the energy resolution is analyzed and the utility of an increased signal sampling rate of 80 MHz is investigated. Furthermore, a number of linear and non-linear energy reconstruction algorithms is reviewed and the performance of a selection of them is compared. It is demonstrated that increasing the order of the Optimal Filter, the algorithm currently in use, improves energy resolution by 2 to 3 % in all detector regions. The Wiener filter with forward correction, a non-linear algorithm, gives an improvement of up to 10 % in some regions, but degrades the resolution in others. A link between this behavior and the probability of falsely detected calorimeter hits is shown and possible solutions are discussed.:1 Introduction 2 An Overview of High-Energy Particle Physics 2.1 The Standard Model of Particle Physics 2.2 Verification of the Standard Model 2.3 Beyond the Standard Model 3 LHC, ATLAS, and the Liquid-Argon Calorimeters 3.1 The Large Hadron Collider 3.2 The ATLAS Detector 3.3 The ATLAS Liquid-Argon Calorimeters 4 Upgrades to the ATLAS Liquid-Argon Calorimeters 4.1 Physics Goals 4.2 Phase-I Upgrade 4.3 Phase-II Upgrade 5 Noise Suppression With Digital Filters 5.1 Terminology 5.2 Digital Filters 5.3 Wiener Filter 5.4 Matched Wiener Filter 5.5 Matched Wiener Filter Without Bias 5.6 Timing Reconstruction, Optimal Filtering, and Selection Criteria 5.7 Forward Correction 5.8 Sparse Signal Restoration 5.9 Artificial Neural Networks 6 Simulation of the ATLAS Liquid-Argon Calorimeter Readout Electronics 6.1 AREUS 6.2 Hit Generation and Sampling 6.3 Pulse Shapes 6.4 Thermal Noise 6.5 Quantization 6.6 Digital Filters 6.7 Statistical Analysis 7 Results of the Readout Electronics Simulation Studies 7.1 Statistical Treatment 7.2 Simulation Verification Using Run-2 Data 7.3 Dependence of the Noise on the Shaping Time 7.4 The Analog Readout Electronics and the ADC 7.5 The Optimal Filter (OF) 7.6 The Wiener Filter 7.7 The Wiener Filter with Forward Correction (WFFC) 7.8 Final Comparison and Conclusions 8 Conclusions and Outlook Appendice

    On-chip memory reduction in CNN hardware design for image super-resolution

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ์ดํ˜์žฌ.Single image super-resolution (SISR) ์„ ์œ„ํ•œ convolutional neural network (CNN) ๋Š” ์˜์ƒ ๋ถ„๋ฅ˜์šฉ CNN๊ณผ ๋‹ฌ๋ฆฌ ๊ณ ํ•ด์ƒ๋„์˜ ์˜์ƒ์„ ์ž…๋ ฅ ๋ฐ›์•„ ๊ณ ํ•ด์ƒ๋„์˜ ์ค‘๊ฐ„ ์—ฐ์‚ฐ ๊ฒฐ๊ณผ์ธ feature map์„ ์ƒ์„ฑ ํ•œ๋‹ค. SISR์šฉ CNN์„ ๊ฐ€์†ํ•˜๊ธฐ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด๋Š” ์ฃผ๋กœ ๋””์Šคํ”Œ๋ ˆ์ด ์žฅ์น˜์— ์ ์šฉ์ด ๋˜๋ฉฐ ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ์ŠคํŠธ๋ฆฌ๋ฐ ๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š”๋‹ค. ์ด๋Š” on-chip ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์ด ์ œํ•œ์ ์ธ ํ•˜๋“œ์›จ์–ด์˜ ํŠน์„ฑ์ƒ ๊ตฌํ˜„์˜ ์–ด๋ ค์›€์„ ์•ผ๊ธฐํ•œ๋‹ค. ๊ธฐ์กด์˜ ์—ฐ๊ตฌ๋“ค์€ on-chip ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฐ์†Œํ•˜๊ธฐ ์œ„ํ•ด ์„ฑ๋Šฅ ์ €ํ•˜ ๋˜๋Š” ์••์ถ• ๋ชจ๋“ˆ์„ ์ถ”๊ฐ€ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์„ฑ๋Šฅ ์ €ํ•˜ ์—†์ด SISR์šฉ CNN ํ•˜๋“œ์›จ์–ด์˜ on-chip ๋ฉ”๋ชจ๋ฆฌ ๊ฐ์†Œ ๋ฐ ํ•˜๋“œ์›จ์–ด๋ฅผ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. CNN ํ•˜๋“œ์›จ์–ด๋Š” VDSR (Very deep neural network for super-resolution) ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ๊ธฐ์กด CNN ํ•˜๋“œ์›จ์–ด์˜ SRAM์— ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ ์ ‘๊ทผ์ด ๋™์‹œ์— ๋ฐœ์ƒํ•˜๋Š” ๋ž˜์Šคํ„ฐ ์Šค์บ” ์ˆœ์„œ๋ฅผ ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ๋กœ ๋ณ€๊ฒฝ ํ•จ์œผ๋กœ ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ ์ ‘๊ทผ ํƒ€์ด๋ฐ์„ ๋ถ„๋ฆฌํ•œ๋‹ค. ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ๋Š” ๊ธฐ์กด์˜ CNN ํ•˜๋“œ์›จ์–ด๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ๋“€์–ผ ํฌํŠธ SRAM ๋Œ€์‹  ์‹ฑ๊ธ€ ํฌํŠธ SRAM์„ ์‚ฌ์šฉํ•˜๋„๋ก ํ•˜๋ฉฐ ์ด๋Š” on-chip ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ๊ฐ์†Œํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์œผ๋กœ VDSR์˜ ํ•„ํ„ฐ์˜ ํ˜•ํƒœ๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•œ๋‹ค. On-chip ๋ฉ”๋ชจ๋ฆฌ์˜ ํฌ๊ธฐ๋Š” ์ปจ๋ณผ๋ฃจ์…˜ ํ•„ํ„ฐ์˜ ๋†’์ด์— ๋น„๋ก€ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ VDSR์˜ ํ•„ํ„ฐ๋Š” ๋Œ€์นญ ๊ตฌ์กฐ ์ค‘ ๊ฐ€์žฅ ์ž‘์€ ํ•„ํ„ฐ ๋ชจ์–‘์ด๋ฏ€๋กœ ํ•ด๋‹น ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ปจํ…์ŠคํŠธ ๋ณด์กด 1D ํ•„ํ„ฐ ๊ตฌ์„ฑ ๋ฐฉ๋ฒ• ๋ฐ ์ปจํ…์ŠคํŠธ๋ฅผ ๊ธฐ๋ฐ˜ํ•œ ์„ธ๋กœ ํ•„ํ„ฐ ๊ฐ์†Œ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ SRAM์˜ ํฌ๊ธฐ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ถ”๊ฐ€์ ์œผ๋กœ ๊ฐ์†Œํ•œ๋‹ค. CNN ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ๊ฐ€ ํ™•์ • ๋œ ์ดํ›„ CNN์˜ SISR ์„ฑ๋Šฅ์„ ๊ฐœ์„  ํ•˜๊ธฐ ์œ„ํ•œ CNNํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ž์—ฐ ์˜์ƒ (natural image)์™€ ํ…์ŠคํŠธ ์˜์ƒ (text image)์— ๋Œ€ํ•ด ๊ฐ๊ฐ ์ œ์•ˆํ•œ๋‹ค. SRGAN (Super-resolution generative adversarial networks) ๋Š” ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ (discriminator network)๋กœ๋ถ€ํ„ฐ ๋ฐœ์ƒํ•˜๋Š” ์†์‹ค์œผ๋กœ SISR์šฉ CNN์ด ์‹ค์ œ ์˜์ƒ์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ์ž์—ฐ ์˜์ƒ์„ ์ถœ๋ ฅํ•˜๋„๋ก ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ SRGAN์€ ๊ณผ์„ ๋ช…ํ™”๋กœ ์ธํ•œ ์‹œ๊ฐ์  ๊ฒฐํ•จ์„ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ SRGAN์˜ ์‹œ๊ฐ์  ๊ฒฐํ•จ์„ ์ œ๊ฑฐํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ ๋‚ด์—์„œ ์˜์ƒ์˜ ์„ธ๋ถ€ ์ •๋ณด ์†์‹ค์„ ๋ฐฉ์ง€ํ•˜๋Š” ํ•ด์ƒ๋„ ์œ ์ง€ ํŒ๋ณ„์ž ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆ ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” ์ฝ˜ํ…ํŠธ ์†์‹ค์„ ๋ฐœ์ƒํ•˜๋Š” VGG ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ์ƒ ์˜์ƒ์˜ ์„ธ๋ถ€์ ์ธ ์ •๋ณด๋ฅผ ์†์‹คํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•ด์ƒ๋„ ์œ ์ง€ ์ฝ˜ํ…ํŠธ ์†์‹ค ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ํ…์ŠคํŠธ ์˜์ƒ์€ ์ž์—ฐ ์˜์ƒ์ด ์•„๋‹Œ ํ•ฉ์„ฑ ์˜์ƒ์œผ๋กœ ์˜์ƒ ๋‚ด ํฐํŠธ์™€ ๋ฐฐ๊ฒฝ์˜ ์ƒ‰์ƒ ์กฐํ•ฉ์„ ๋‹ค์–‘ํ•˜๊ฒŒ ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ๋‹ค. ๊ธฐ์กด์˜ CNN ํ•™์Šต ๋ฐฉ๋ฒ•์€ ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ์˜์ƒ์„ ํ•™์Šต ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ชจ๋“  ์ข…๋ฅ˜์˜ ์ƒ‰์ƒ ์กฐํ•ฉ์„ CNN์— ํ•™์Šต ์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์˜์ƒ ์••์ถ•์— ์‚ฌ์šฉ๋˜๋Š” De-colorization ๋ฐฉ๋ฒ•์„ ์ฐจ์šฉํ•˜์—ฌ CNN์ด ํ•™์Šตํ•  ์˜์ƒ์„ ๊ฒ€์€ ํฐํŠธ์™€ ํฐ์ƒ‰ ๋ฐฐ๊ฒฝ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ์˜์ƒ์œผ๋กœ ํ•œ์ • ํ•จ์œผ๋กœ ํ•™์Šต๋˜์ง€ ์•Š์€ ์˜์ƒ์˜ ํฐํŠธ ๋ฐ ๋ฐฐ๊ฒฝ ์ƒ‰์ƒ ์กฐํ•ฉ์—๋„ ์‹œ๊ฐ์  ๊ฒฐํ•จ ์—†์ด SISR ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆ ํ•œ๋‹ค.Unlike convolutional neural network (CNN) for image classification, CNN for single image super-resolution (SISR) receives high-resolution image and generates feature maps which are high-resolution intermediate results. The hardware for accelerating the CNN for SISR is mainly applied to the display device, and the CNN hardware has a streaming architecture in which external memory access is impossible. This causes implementation difficulties due to the limited hardware capacity of the on-chip memory. This paper proposes two methods for designing CNN hardware for SISR using limited hardware resources. CNN hardware is based on a very deep neural network for super-resolution (VDSR) architecture. By using the partially-vertical order for the convolution layers, simultaneous read and write accesses to SRAM are prevented. The proposed order makes CNN use single-port SRAM instead of dual-port SRAM, and it reduces on-chip memory area by half. The second method is to change the shape of the filter in VDSR. The size of the on-chip memory is proportional to the height of the convolution filter. However, since the filter of VDSR is the smallest of the symmetric shape, it is impossible to reduce the filter height of the VDSR. To solve this problem, a method of constructing a context-preserving 1D filter and a method of decreasing a vertical filter based on the context are proposed. These proposed methods reduce the size of the SRAM in half. Two CNN training methods for SISR of natural image and that of text image are proposed. These methods improve SISR performance after the CNN hardware architecture is confirmed. SRGAN (super-resolution generative adversarial networks) is trained by the help of discriminator network to generate realistic natural images. However, SRGAN has the problem of causing visual defects due to over-sharpening. This paper proposes two methods to eliminate the visual defects of SRGAN. First, the resolution-preserving discriminator network structure is proposed. This discriminator network prevents detailed information loss in the network by changing the structure of it. Second, the resolution-preserving content loss is proposed to solve the problem of loss of detailed information of image due to the structure of VGG19 network that causes content loss. The text image is not a natural image but a synthetic image. The color combination of the font and the background in the image can be variously changed. The existing CNN learning method uses a method of learning various kinds of images to generalize the network. However, it is impossible to learn all kinds of color combinations on CNN. This paper uses the de-colorization method used in image compression to limit the image to be learned by CNN to a black font and a white background image. As a result, CNN performs SISR operation without visual flaws in the font and background color combination image of the trained image.์ œ 1 ์žฅ ์„œ ๋ก  1 1.1 ์—ฐ๊ตฌ์˜ ๋ฐฐ๊ฒฝ 1 1.2 ์—ฐ๊ตฌ์˜ ๋‚ด์šฉ 5 1.3 ๋…ผ๋ฌธ์˜ ๊ตฌ์„ฑ 8 ์ œ 2 ์žฅ ์ด์ „ ์—ฐ๊ตฌ 9 2.1 SISR CNN ์•Œ๊ณ ๋ฆฌ์ฆ˜ 9 2.2 ์ŠคํŠธ๋ฆฌ๋ฐ ๊ตฌ์กฐ์˜ SISR ํ•˜๋“œ์›จ์–ด 14 2.3 ๊ธฐ์กด CNN ํ•˜๋“œ์›จ์–ด์˜ on-chip ๋ฉ”๋ชจ๋ฆฌ ๊ฐ์†Œ ๋ฐฉ๋ฒ• 15 2.4 De-colorization 17 ์ œ 3 ์žฅ ์ปจ๋ณผ๋ฃจ์…˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ SRAM ๋ฉด์  ๊ฐ์†Œ๋ฅผ ์œ„ํ•œ ์—ฐ์‚ฐ ์ˆœ์„œ ๋ณ€๊ฒฝ 20 3.1 ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ ์ปจ๋ณผ๋ฃจ์…˜ ์—ฐ์‚ฐ 20 3.2 ifmap์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ ˆ์ง€์Šคํ„ฐ 24 3.3 CNN์˜ ์ฒซ ๋ฒˆ์งธ ๋ฐ ๋งˆ์ง€๋ง‰ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด SRAM ๊ตฌ์„ฑ 26 3.4 fmap์˜ SRAM ๋‹ค์ฑ„๋„ ๊ณต์œ ๋ฅผ ์œ„ํ•œ ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ 28 3.5 ๋ถ€๋ถ„์  ์ˆ˜์ง ์ˆœ์„œ์˜ ์ ์šฉ ๊ฐ€๋Šฅ CNN ๊ตฌ์กฐ 33 3.5 ์‹คํ—˜ ๊ฒฐ๊ณผ 36 ์ œ 4 ์žฅ ์˜์ƒ์˜ ์ปจํ…์ŠคํŠธ ๋ณด์กด์„ ์œ„ํ•œ ํ•„ํ„ฐ ์žฌ๊ตฌ์„ฑ ๋ฐ CNN ํ•˜๋“œ์›จ์–ด ์„ค๊ณ„ 42 4.1 SRAM ๊ฐ์†Œ๋ฅผ ์œ„ํ•œ ์ œ์•ˆ ์•Œ๊ณ ๋ฆฌ์ฆ˜ 43 4.2 SISR์šฉ CNN ํ•˜๋“œ์›จ์–ด ๊ตฌ์กฐ 49 4.3 ์‹คํ—˜ ๊ฒฐ๊ณผ 55 ์ œ 5 ์žฅ SISR์„ ์œ„ํ•œ ํ•ด์ƒ๋„ ๋ณด์กด ์ƒ์‚ฐ์  ์ ๋Œ€ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ 64 5.1 ํ•ด์ƒ๋„ ๋ณด์กด ํŒ๋ณ„ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ 64 5.2 ํ•ด์ƒ๋„ ๋ณด์กด ์ฝ˜ํ…ํŠธ ์†์‹ค 68 5.3 ์‹คํ—˜ ๊ฒฐ๊ณผ 70 ์ œ 6 ์žฅ De-colorization์„ ์ ์šฉํ•œ text SISR 84 6.1 Text de-colorization์„ ์ ์šฉํ•œ CNN ํ•™์Šต 84 6.2 ์‹คํ—˜ ๊ฒฐ๊ณผ 86 ์ œ 7 ์žฅ ๊ฒฐ๋ก  95 ์ฐธ๊ณ ๋ฌธํ—Œ 98 Abstract 105Docto
    corecore