137 research outputs found

    Learning Representations for Controllable Image Restoration

    Get PDF
    Deep Convolutional Neural Networks have sparked a renaissance in all the sub-fields of computer vision. Tremendous progress has been made in the area of image restoration. The research community has pushed the boundaries of image deblurring, super-resolution, and denoising. However, given a distorted image, most existing methods typically produce a single restored output. The tasks mentioned above are inherently ill-posed, leading to an infinite number of plausible solutions. This thesis focuses on designing image restoration techniques capable of producing multiple restored results and granting users more control over the restoration process. Towards this goal, we demonstrate how one could leverage the power of unsupervised representation learning. Image restoration is vital when applied to distorted images of human faces due to their social significance. Generative Adversarial Networks enable an unprecedented level of generated facial details combined with smooth latent space. We leverage the power of GANs towards the goal of learning controllable neural face representations. We demonstrate how to learn an inverse mapping from image space to these latent representations, tuning these representations towards a specific task, and finally manipulating latent codes in these spaces. For example, we show how GANs and their inverse mappings enable the restoration and editing of faces in the context of extreme face super-resolution and the generation of novel view sharp videos from a single motion-blurred image of a face. This thesis also addresses more general blind super-resolution, denoising, and scratch removal problems, where blur kernels and noise levels are unknown. We resort to contrastive representation learning and first learn the latent space of degradations. We demonstrate that the learned representation allows inference of ground-truth degradation parameters and can guide the restoration process. Moreover, it enables control over the amount of deblurring and denoising in the restoration via manipulation of latent degradation features

    Continuous Facial Motion Deblurring

    Full text link
    We introduce a novel framework for continuous facial motion deblurring that restores the continuous sharp moment latent in a single motion-blurred face image via a moment control factor. Although a motion-blurred image is the accumulated signal of continuous sharp moments during the exposure time, most existing single image deblurring approaches aim to restore a fixed number of frames using multiple networks and training stages. To address this problem, we propose a continuous facial motion deblurring network based on GAN (CFMD-GAN), which is a novel framework for restoring the continuous moment latent in a single motion-blurred face image with a single network and a single training stage. To stabilize the network training, we train the generator to restore continuous moments in the order determined by our facial motion-based reordering process (FMR) utilizing domain-specific knowledge of the face. Moreover, we propose an auxiliary regressor that helps our generator produce more accurate images by estimating continuous sharp moments. Furthermore, we introduce a control-adaptive (ContAda) block that performs spatially deformable convolution and channel-wise attention as a function of the control factor. Extensive experiments on the 300VW datasets demonstrate that the proposed framework generates a various number of continuous output frames by varying the moment control factor. Compared with the recent single-to-single image deblurring networks trained with the same 300VW training set, the proposed method show the superior performance in restoring the central sharp frame in terms of perceptual metrics, including LPIPS, FID and Arcface identity distance. The proposed method outperforms the existing single-to-video deblurring method for both qualitative and quantitative comparisons

    Motion deblurring of faces

    Get PDF
    Face analysis is a core part of computer vision, in which remarkable progress has been observed in the past decades. Current methods achieve recognition and tracking with invariance to fundamental modes of variation such as illumination, 3D pose, expressions. Notwithstanding, a much less standing mode of variation is motion deblurring, which however presents substantial challenges in face analysis. Recent approaches either make oversimplifying assumptions, e.g. in cases of joint optimization with other tasks, or fail to preserve the highly structured shape/identity information. Therefore, we propose a data-driven method that encourages identity preservation. The proposed model includes two parallel streams (sub-networks): the first deblurs the image, the second implicitly extracts and projects the identity of both the sharp and the blurred image in similar subspaces. We devise a method for creating realistic motion blur by averaging a variable number of frames to train our model. The averaged images originate from a 2MF2 dataset with 10 million facial frames, which we introduce for the task. Considering deblurring as an intermediate step, we utilize the deblurred outputs to conduct a thorough experimentation on high-level face analysis tasks, i.e. landmark localization and face verification. The experimental evaluation demonstrates the superiority of our method

    New Datasets, Models, and Optimization

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์†ํ˜„ํƒœ.์‚ฌ์ง„ ์ดฌ์˜์˜ ๊ถ๊ทน์ ์ธ ๋ชฉํ‘œ๋Š” ๊ณ ํ’ˆ์งˆ์˜ ๊นจ๋—ํ•œ ์˜์ƒ์„ ์–ป๋Š” ๊ฒƒ์ด๋‹ค. ํ˜„์‹ค์ ์œผ๋กœ, ์ผ์ƒ์˜ ์‚ฌ์ง„์€ ์ž์ฃผ ํ”๋“ค๋ฆฐ ์นด๋ฉ”๋ผ์™€ ์›€์ง์ด๋Š” ๋ฌผ์ฒด๊ฐ€ ์žˆ๋Š” ๋™์  ํ™˜๊ฒฝ์—์„œ ์ฐ๋Š”๋‹ค. ๋…ธ์ถœ์‹œ๊ฐ„ ์ค‘์˜ ์นด๋ฉ”๋ผ์™€ ํ”ผ์‚ฌ์ฒด๊ฐ„์˜ ์ƒ๋Œ€์ ์ธ ์›€์ง์ž„์€ ์‚ฌ์ง„๊ณผ ๋™์˜์ƒ์—์„œ ๋ชจ์…˜ ๋ธ”๋Ÿฌ๋ฅผ ์ผ์œผํ‚ค๋ฉฐ ์‹œ๊ฐ์ ์ธ ํ™”์งˆ์„ ์ €ํ•˜์‹œํ‚จ๋‹ค. ๋™์  ํ™˜๊ฒฝ์—์„œ ๋ธ”๋Ÿฌ์˜ ์„ธ๊ธฐ์™€ ์›€์ง์ž„์˜ ๋ชจ์–‘์€ ๋งค ์ด๋ฏธ์ง€๋งˆ๋‹ค, ๊ทธ๋ฆฌ๊ณ  ๋งค ํ”ฝ์…€๋งˆ๋‹ค ๋‹ค๋ฅด๋‹ค. ๊ตญ์ง€์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ๋ธ”๋Ÿฌ์˜ ์„ฑ์งˆ์€ ์‚ฌ์ง„๊ณผ ๋™์˜์ƒ์—์„œ์˜ ๋ชจ์…˜ ๋ธ”๋Ÿฌ ์ œ๊ฑฐ๋ฅผ ์‹ฌ๊ฐํ•˜๊ฒŒ ํ’€๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ ํ•ด๋‹ต์ด ํ•˜๋‚˜๋กœ ์ •ํ•ด์ง€์ง€ ์•Š์€, ์ž˜ ์ •์˜๋˜์ง€ ์•Š์€ ๋ฌธ์ œ๋กœ ๋งŒ๋“ ๋‹ค. ๋ฌผ๋ฆฌ์ ์ธ ์›€์ง์ž„ ๋ชจ๋ธ๋ง์„ ํ†ตํ•ด ํ•ด์„์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์„ค๊ณ„ํ•˜๊ธฐ๋ณด๋‹ค๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ ‘๊ทผ๋ฒ•์€ ์ด๋Ÿฌํ•œ ์ž˜ ์ •์˜๋˜์ง€ ์•Š์€ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๋ณด๋‹ค ํ˜„์‹ค์ ์ธ ๋‹ต์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ ๋”ฅ ๋Ÿฌ๋‹์€ ์ตœ๊ทผ ์ปดํ“จํ„ฐ ๋น„์ „ ํ•™๊ณ„์—์„œ ํ‘œ์ค€์ ์ธ ๊ธฐ๋ฒ•์ด ๋˜์–ด ๊ฐ€๊ณ  ์žˆ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ์‚ฌ์ง„ ๋ฐ ๋น„๋””์˜ค ๋””๋ธ”๋Ÿฌ๋ง ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์†”๋ฃจ์…˜์„ ๋„์ž…ํ•˜๋ฉฐ ์—ฌ๋Ÿฌ ํ˜„์‹ค์ ์ธ ๋ฌธ์ œ๋ฅผ ๋‹ค๊ฐ์ ์œผ๋กœ ๋‹ค๋ฃฌ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋””๋ธ”๋Ÿฌ๋ง ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์ทจ๋“ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ชจ์…˜ ๋ธ”๋Ÿฌ๊ฐ€ ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์‹œ๊ฐ„์ ์œผ๋กœ ์ •๋ ฌ๋œ ์ƒํƒœ๋กœ ๋™์‹œ์— ์ทจ๋“ํ•˜๋Š” ๊ฒƒ์€ ์‰ฌ์šด ์ผ์ด ์•„๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ ๋””๋ธ”๋Ÿฌ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ง€๋„ํ•™์Šต ๊ธฐ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ๋„ ๋ถˆ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ณ ์† ๋น„๋””์˜ค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์นด๋ฉ”๋ผ ์˜์ƒ ์ทจ๋“ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ชจ๋ฐฉํ•˜๋ฉด ์‹ค์ œ์ ์ธ ๋ชจ์…˜ ๋ธ”๋Ÿฌ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ธฐ์กด์˜ ๋ธ”๋Ÿฌ ํ•ฉ์„ฑ ๊ธฐ๋ฒ•๋“ค๊ณผ ๋‹ฌ๋ฆฌ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ์›€์ง์ด๋Š” ํ”ผ์‚ฌ์ฒด๋“ค๊ณผ ๋‹ค์–‘ํ•œ ์˜์ƒ ๊นŠ์ด, ์›€์ง์ž„ ๊ฒฝ๊ณ„์—์„œ์˜ ๊ฐ€๋ฆฌ์›Œ์ง ๋“ฑ์œผ๋กœ ์ธํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ๊ตญ์†Œ์  ๋ธ”๋Ÿฌ์˜ ๋ณต์žก๋„๋ฅผ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์ œ์•ˆ๋œ ๋ฐ์ดํ„ฐ์…‹์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋‹จ์ผ์˜์ƒ ๋””๋ธ”๋Ÿฌ๋ง์„ ์œ„ํ•œ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ตœ์ ํ™”๊ธฐ๋ฒ• ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๋””๋ธ”๋Ÿฌ๋ง ๋ฐฉ์‹์—์„œ ๋„๋ฆฌ ์“ฐ์ด๊ณ  ์žˆ๋Š” ์ ์ฐจ์  ๋ฏธ์„ธํ™” ์ ‘๊ทผ๋ฒ•์„ ๋ฐ˜์˜ํ•˜์—ฌ ๋‹ค์ค‘๊ทœ๋ชจ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์„ค๊ณ„ํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋‹ค์ค‘๊ทœ๋ชจ ๋ชจ๋ธ์€ ๋น„์Šทํ•œ ๋ณต์žก๋„๋ฅผ ๊ฐ€์ง„ ๋‹จ์ผ๊ทœ๋ชจ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ๋†’์€ ๋ณต์› ์ •ํ™•๋„๋ฅผ ๋ณด์ธ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ, ๋น„๋””์˜ค ๋””๋ธ”๋Ÿฌ๋ง์„ ์œ„ํ•œ ์ˆœํ™˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋””๋ธ”๋Ÿฌ๋ง์„ ํ†ตํ•ด ๊ณ ํ’ˆ์งˆ์˜ ๋น„๋””์˜ค๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฐ ํ”„๋ ˆ์ž„๊ฐ„์˜ ์‹œ๊ฐ„์ ์ธ ์ •๋ณด์™€ ํ”„๋ ˆ์ž„ ๋‚ด๋ถ€์ ์ธ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋‚ด๋ถ€ํ”„๋ ˆ์ž„ ๋ฐ˜๋ณต์  ์—ฐ์‚ฐ๊ตฌ์กฐ๋Š” ๋‘ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•จ๊ป˜ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค์ง€ ์•Š๊ณ ๋„ ๋””๋ธ”๋Ÿฌ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ƒˆ๋กœ์šด ๋””๋ธ”๋Ÿฌ๋ง ๋ชจ๋ธ๋“ค์„ ๋ณด๋‹ค ์ž˜ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋กœ์Šค ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊นจ๋—ํ•˜๊ณ  ๋˜๋ ทํ•œ ์‚ฌ์ง„ ํ•œ ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ชจ์…˜ ๋ธ”๋Ÿฌ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์€ ๋ธ”๋Ÿฌ๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์–ด๋ ค์šด ๋ฌธ์ œ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ†ต์ƒ ์‚ฌ์šฉํ•˜๋Š” ๋กœ์Šค ํ•จ์ˆ˜๋กœ ์–ป์€ ๋””๋ธ”๋Ÿฌ๋ง ๋ฐฉ๋ฒ•๋“ค์€ ๋ธ”๋Ÿฌ๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜์ง€ ๋ชปํ•˜๋ฉฐ ๋””๋ธ”๋Ÿฌ๋œ ์ด๋ฏธ์ง€์˜ ๋‚จ์•„์žˆ๋Š” ๋ธ”๋Ÿฌ๋กœ๋ถ€ํ„ฐ ์›๋ž˜์˜ ๋ธ”๋Ÿฌ๋ฅผ ์žฌ๊ฑดํ•  ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฆฌ๋ธ”๋Ÿฌ๋ง ๋กœ์Šค ํ•จ์ˆ˜๋Š” ๋””๋ธ”๋Ÿฌ๋ง ์ˆ˜ํ–‰์‹œ ๋ชจ์…˜ ๋ธ”๋Ÿฌ๋ฅผ ๋ณด๋‹ค ์ž˜ ์ œ๊ฑฐํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ๋‹ค. ์ด์— ๋‚˜์•„๊ฐ€ ์ œ์•ˆํ•œ ์ž๊ธฐ์ง€๋„ํ•™์Šต ๊ณผ์ •์œผ๋กœ๋ถ€ํ„ฐ ํ…Œ์ŠคํŠธ์‹œ ๋ชจ๋ธ์ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ์ ์‘ํ•˜๋„๋ก ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ œ์•ˆ๋œ ๋ฐ์ดํ„ฐ์…‹, ๋ชจ๋ธ ๊ตฌ์กฐ, ๊ทธ๋ฆฌ๊ณ  ๋กœ์Šค ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋”ฅ ๋Ÿฌ๋‹์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋‹จ์ผ ์˜์ƒ ๋ฐ ๋น„๋””์˜ค ๋””๋ธ”๋Ÿฌ๋ง ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ์ •๋Ÿ‰์  ๋ฐ ์ •์„ฑ์ ์œผ๋กœ ์ตœ์ฒจ๋‹จ ๋””๋ธ”๋Ÿฌ๋ง ์„ฑ๊ณผ๋ฅผ ์ฆ๋ช…ํ•œ๋‹ค.Obtaining a high-quality clean image is the ultimate goal of photography. In practice, daily photography is often taken in dynamic environments with moving objects as well as shaken cameras. The relative motion between the camera and the objects during the exposure causes motion blur in images and videos, degrading the visual quality. The degree of blur strength and the shape of motion trajectory varies by every image and every pixel in dynamic environments. The locally-varying property makes the removal of motion blur in images and videos severely ill-posed. Rather than designing analytic solutions with physical modelings, using machine learning-based approaches can serve as a practical solution for such a highly ill-posed problem. Especially, deep-learning has been the recent standard in computer vision literature. This dissertation introduces deep learning-based solutions for image and video deblurring by tackling practical issues in various aspects. First, a new way of constructing the datasets for dynamic scene deblurring task is proposed. It is nontrivial to simultaneously obtain a pair of the blurry and the sharp image that are temporally aligned. The lack of data prevents the supervised learning techniques to be developed as well as the evaluation of deblurring algorithms. By mimicking the camera image pipeline with high-speed videos, realistic blurry images could be synthesized. In contrast to the previous blur synthesis methods, the proposed approach can reflect the natural complex local blur from and multiple moving objects, varying depth, and occlusion at motion boundaries. Second, based on the proposed datasets, a novel neural network architecture for single-image deblurring task is presented. Adopting the coarse-to-fine approach that is widely used in energy optimization-based methods for image deblurring, a multi-scale neural network architecture is derived. Compared with the single-scale model with similar complexity, the multi-scale model exhibits higher accuracy and faster speed. Third, a light-weight recurrent neural network model architecture for video deblurring is proposed. In order to obtain a high-quality video from deblurring, it is important to exploit the intrinsic information in the target frame as well as the temporal relation between the neighboring frames. Taking benefits from both sides, the proposed intra-frame iterative scheme applied to the RNNs achieves accuracy improvements without increasing the number of model parameters. Lastly, a novel loss function is proposed to better optimize the deblurring models. Estimating a dynamic blur for a clean and sharp image without given motion information is another ill-posed problem. While the goal of deblurring is to completely get rid of motion blur, conventional loss functions fail to train neural networks to fulfill the goal, leaving the trace of blur in the deblurred images. The proposed reblurring loss functions are designed to better eliminate the motion blur and to produce sharper images. Furthermore, the self-supervised learning process facilitates the adaptation of the deblurring model at test-time. With the proposed datasets, model architectures, and the loss functions, the deep learning-based single-image and video deblurring methods are presented. Extensive experimental results demonstrate the state-of-the-art performance both quantitatively and qualitatively.1 Introduction 1 2 Generating Datasets for Dynamic Scene Deblurring 7 2.1 Introduction 7 2.2 GOPRO dataset 9 2.3 REDS dataset 11 2.4 Conclusion 18 3 Deep Multi-Scale Convolutional Neural Networks for Single Image Deblurring 19 3.1 Introduction 19 3.1.1 Related Works 21 3.1.2 Kernel-Free Learning for Dynamic Scene Deblurring 23 3.2 Proposed Method 23 3.2.1 Model Architecture 23 3.2.2 Training 26 3.3 Experiments 29 3.3.1 Comparison on GOPRO Dataset 29 3.3.2 Comparison on Kohler Dataset 33 3.3.3 Comparison on Lai et al. [54] dataset 33 3.3.4 Comparison on Real Dynamic Scenes 34 3.3.5 Effect of Adversarial Loss 34 3.4 Conclusion 41 4 Intra-Frame Iterative RNNs for Video Deblurring 43 4.1 Introduction 43 4.2 Related Works 46 4.3 Proposed Method 50 4.3.1 Recurrent Video Deblurring Networks 51 4.3.2 Intra-Frame Iteration Model 52 4.3.3 Regularization by Stochastic Training 56 4.4 Experiments 58 4.4.1 Datasets 58 4.4.2 Implementation details 59 4.4.3 Comparisons on GOPRO [72] dataset 59 4.4.4 Comparisons on [97] Dataset and Real Videos 60 4.5 Conclusion 61 5 Learning Loss Functions for Image Deblurring 67 5.1 Introduction 67 5.2 Related Works 71 5.3 Proposed Method 73 5.3.1 Clean Images are Hard to Reblur 73 5.3.2 Supervision from Reblurring Loss 75 5.3.3 Test-time Adaptation by Self-Supervision 76 5.4 Experiments 78 5.4.1 Effect of Reblurring Loss 78 5.4.2 Effect of Sharpness Preservation Loss 80 5.4.3 Comparison with Other Perceptual Losses 81 5.4.4 Effect of Test-time Adaptation 81 5.4.5 Comparison with State-of-The-Art Methods 82 5.4.6 Real World Image Deblurring 85 5.4.7 Combining Reblurring Loss with Other Perceptual Losses 86 5.4.8 Perception vs. Distortion Trade-Off 87 5.4.9 Visual Comparison of Loss Function 88 5.4.10 Implementation Details 89 5.4.11 Determining Reblurring Module Size 94 5.5 Conclusion 95 6 Conclusion 97 ๊ตญ๋ฌธ ์ดˆ๋ก 115 ๊ฐ์‚ฌ์˜ ๊ธ€ 117๋ฐ•

    FCL-GAN: A Lightweight and Real-Time Baseline for Unsupervised Blind Image Deblurring

    Full text link
    Blind image deblurring (BID) remains a challenging and significant task. Benefiting from the strong fitting ability of deep learning, paired data-driven supervised BID method has obtained great progress. However, paired data are usually synthesized by hand, and the realistic blurs are more complex than synthetic ones, which makes the supervised methods inept at modeling realistic blurs and hinders their real-world applications. As such, unsupervised deep BID method without paired data offers certain advantages, but current methods still suffer from some drawbacks, e.g., bulky model size, long inference time, and strict image resolution and domain requirements. In this paper, we propose a lightweight and real-time unsupervised BID baseline, termed Frequency-domain Contrastive Loss Constrained Lightweight CycleGAN (shortly, FCL-GAN), with attractive properties, i.e., no image domain limitation, no image resolution limitation, 25x lighter than SOTA, and 5x faster than SOTA. To guarantee the lightweight property and performance superiority, two new collaboration units called lightweight domain conversion unit(LDCU) and parameter-free frequency-domain contrastive unit(PFCU) are designed. LDCU mainly implements inter-domain conversion in lightweight manner. PFCU further explores the similarity measure, external difference and internal connection between the blurred domain and sharp domain images in frequency domain, without involving extra parameters. Extensive experiments on several image datasets demonstrate the effectiveness of our FCL-GAN in terms of performance, model size and reference time
    • โ€ฆ
    corecore