325 research outputs found
An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement
Video enhancement is a challenging problem, more than that of stills, mainly
due to high computational cost, larger data volumes and the difficulty of
achieving consistency in the spatio-temporal domain. In practice, these
challenges are often coupled with the lack of example pairs, which inhibits the
application of supervised learning strategies. To address these challenges, we
propose an efficient adversarial video enhancement framework that learns
directly from unpaired video examples. In particular, our framework introduces
new recurrent cells that consist of interleaved local and global modules for
implicit integration of spatial and temporal information. The proposed design
allows our recurrent cells to efficiently propagate spatio-temporal information
across frames and reduces the need for high complexity networks. Our setting
enables learning from unpaired videos in a cyclic adversarial manner, where the
proposed recurrent units are employed in all architectures. Efficient training
is accomplished by introducing one single discriminator that learns the joint
distribution of source and target domain simultaneously. The enhancement
results demonstrate clear superiority of the proposed video enhancer over the
state-of-the-art methods, in all terms of visual quality, quantitative metrics,
and inference speed. Notably, our video enhancer is capable of enhancing over
35 frames per second of FullHD video (1080x1920)
NTIRE 2023 Quality Assessment of Video Enhancement Challenge
This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance
Motion Offset for Blur Modeling
Motion blur caused by the relative movement between the camera and the subject is often an undesirable degradation of the image quality. In most conventional deblurring methods, a blur kernel is estimated for image deconvolution. Due to the ill-posed nature, predefined priors are proposed to suppress the ill-posedness. However, these predefined priors can only handle some specific situations. In order to achieve a better deblurring performance on dynamic scene, deep-learning based methods are proposed to learn a mapping function that restore the sharp image from a blurry image. The blur may be implicitly modelled in feature extraction module. However, the blur modelled from the paired dataset cannot be well generalized to some real-world scenes. To summary, an accurate and dynamic blur model that more closely approximates real-world blur is needed.
By revisiting the principle of camera exposure, we can model the blur with the displacements between sharp pixels and the exposed pixel, namely motion offsets. Given specific physical constraints, motion offsets are able to form different exposure trajectories (i.e. linear, quadratic). Compare to conventional blur kernel, our proposed motion offsets are a more rigorous approximation for real-world blur, since they can constitute a non-linear and non-uniform motion field. Through learning from dynamic scene dataset, an accurate and spatial-variant motion offset field is obtained.
With accurate motion information and a compact blur modeling method, we explore the ways of utilizing motion information to facilitate multiple blur-related tasks. By introducing recovered motion offsets, we build up a motion-aware and spatial-variant convolution. For extracting a video clip from a blurry image, motion offsets can provide an explicit (non-)linear motion trajectory for interpolating. We also work towards a better image deblurring performance in real-world scenarios by improving the generalization ability of the deblurring model
Deep learning for accelerated magnetic resonance imaging
Medical imaging has aided the biggest advance in the medical domain in the last century. Whilst X-ray, CT, PET and ultrasound are a form of imaging that can be useful in particular scenarios, they each have disadvantages in cost, image quality, ease-of-use and ionising radiation. MRI is a slow imaging protocol which contributes to its high cost to run. However, MRI is a very versatile imaging protocol allowing images of varying contrast to be easily generated whilst not requiring the use of ionising radiation. If MRI can be made to be more efficient and smart, the effective cost of running MRI may be more affordable and accessible. The focus of this thesis is decreasing the acquisition time involved in MRI whilst maintaining the quality of the generated images and thus diagnosis. In particular, we focus on data-driven deep learning approaches that aid in the image reconstruction process and streamline the diagnostic process. We focus on three particular aspects of MR acquisition. Firstly, we investigate the use of motion estimation in the cine reconstruction process. Motion allows us to combine an abundance of imaging data in a learnt reconstruction model allowing acquisitions to be sped up by up to 50 times in extreme scenarios. Secondly, we investigate the possibility of using under-acquired MR data to generate smart diagnoses in the form of automated text reports. In particular, we investigate the possibility of skipping the imaging reconstruction phase altogether at inference time and instead, directly seek to generate radiological text reports for diffusion-weighted brain images in an effort to streamline the diagnostic process. Finally, we investigate the use of probabilistic modelling for MRI reconstruction without the use of fully-acquired data. In particular, we note that acquiring fully-acquired reference images in MRI can be difficult and nonetheless may still contain undesired artefacts that lead to degradation of the dataset and thus the training process. In this chapter, we investigate the possibility of performing reconstruction without fully-acquired references and furthermore discuss the possibility of generating higher quality outputs than that of the fully-acquired references.Open Acces
λΉλμ€ νλ μ 보κ°μ μν ν μ€νΈ λ¨κ³μ μ μμ λ°©λ²λ‘ μ°κ΅¬
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2021. 2. μ΄κ²½λ¬΄.Computationally handling videos has been one of the foremost goals in computer vision. In particular, analyzing the complex dynamics including motion and occlusion between two frames is of fundamental importance in understanding the visual contents of a video. Research on video frame interpolation, a problem where the goal is to synthesize high-quality intermediate frames between the two input frames, specifically investigates the low-level characteristics within the consecutive frames of a video. The topic has been recently gaining increased popularity and can be applied to various real-world applications such as generating slow-motion effects, novel view synthesis, or video stabilization. Existing methods for video frame interpolation aim to design complex new architectures to effectively estimate and compensate for the motion between two input frames. However, natural videos contain a wide variety of different scenarios, including foreground/background appearance and motion, frame rate, and occlusion. Therefore, even with a huge amount of training data, it is difficult for a single model to generalize well on all possible situations.
This dissertation introduces novel methodologies for test-time adaptation for tackling the problem of video frame interpolation. In particular, I propose to enable three different aspects of the deep-learning-based framework to be adaptive: (1) feature activation, (2) network weights, and (3) architectural structures. Specifically, I first present how adaptively scaling the feature activations of a deep neural network with respect to each input frame using attention models allows for accurate interpolation. Unlike the previous approaches that heavily depend on optical flow estimation models, the proposed channel-attention-based model can achieve high-quality frame synthesis without explicit motion estimation. Then, meta-learning is employed for fast adaptation of the parameter values of the frame interpolation models. By learning to adapt for each input video clips, the proposed framework can consistently improve the performance of many existing models with just a single gradient update to its parameters. Lastly, I introduce an input-adaptive dynamic architecture that can assign different inference paths with respect to each local region of the input frames. By deciding the scaling factors of the inputs and the network depth of the early exit in the interpolation model, the dynamic framework can greatly improve the computational efficiency while maintaining, and sometimes even outperforming the performance of the baseline interpolation method.
The effectiveness of the proposed test-time adaptation methodologies is extensively evaluated with multiple benchmark datasets for video frame interpolation. Thorough ablation studies with various hyperparameter settings and baseline networks also demonstrate the superiority of adaptation to the test-time inputs, which is a new research direction orthogonal to the other state-of-the-art frame interpolation approaches.κ³μ°μ μΌλ‘ λΉλμ€ λ°μ΄ν°λ₯Ό μ²λ¦¬νλ κ²μ μ»΄ν¨ν° λΉμ λΆμΌμ μ€μν λͺ©ν μ€ νλμ΄κ³ , μ΄λ₯Ό μν΄μ λ λΉλμ€ νλ μ μ¬μ΄μ μμ§μκ³Ό κ°λ¦¬μ΄μ§ λ±μ 볡μ‘ν μ 보λ₯Ό λΆμνλ κ²μ΄ νμμ μ΄λ€. λΉλμ€ νλ μ 보κ°λ²μ λ μ
λ ₯ νλ μ μ¬μ΄μ μ€κ° νλ μμ μ ννκ² μμ±νλ κ²μ λͺ©νλ‘ νλ λ¬Έμ λ‘, μ°μλ λΉλμ€ νλ μ μ¬μ΄μ μ λ°ν (νμ λ¨μμ) νΉμ§λ€μ μμ§μκ³Ό κ°λ¦¬μ΄μ§μ κ³ λ €νμ¬ λΆμνλλ‘ μ°κ΅¬λμλ€. μ΄ λΆμΌλ μ¬λ‘μ°λͺ¨μ
ν¨κ³Ό μμ±, λ€λ₯Έ μμ μμ λ°λΌλ³΄λ 물체 μμ±, μλ¨λ¦Ό 보μ λ± μ€μνμ λ€μν μ΄ν리μΌμ΄μ
μ νμ©λ μ μκΈ° λλ¬Έμ μ΅κ·Όμ λ§μ κ΄μ¬μ λ°κ³ μλ€. κΈ°μ‘΄μ λ°©λ²λ€μ λ μ
λ ₯ νλ μ μ¬μ΄μ ν½μ
λ¨μ μμ§μ μ 보λ₯Ό ν¨κ³Όμ μΌλ‘ μμΈ‘νκ³ λ³΄μνλ λ°©ν₯μΌλ‘ μ°κ΅¬λμ΄μλ€. νμ§λ§ μ€μ λΉλμ€ λ°μ΄ν°λ λ€μν λ¬Όμ²΄λ€ λ° λ³΅μ‘ν λ°°κ²½μ μμ§μ, μ΄μ λ°λ₯Έ κ°λ¦¬μ΄μ§, λΉλμ€λ§λ€ λ¬λΌμ§λ νλ μμ¨ λ± λ§€μ° λ€μν νκ²½μ λ΄κ³ μλ€. λ°λΌμ νλμ λͺ¨λΈλ‘ λͺ¨λ νκ²½μ μΌλ°μ μΌλ‘ μ λμνλ λͺ¨λΈμ νμ΅νλ κ²μ μλ§μ νμ΅ λ°μ΄ν°λ₯Ό νμ©νμ¬λ λ§€μ° μ΄λ €μ΄ λ¬Έμ μ΄λ€.
λ³Έ νμ λ
Όλ¬Έμμλ λΉλμ€ νλ μ λ³΄κ° λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν ν
μ€νΈ λ¨κ³μ μ μμ λ°©λ²λ‘ λ€μ μ μνλ€. νΉν λ₯λ¬λ κΈ°λ°μ νλ μμν¬λ₯Ό μ μμ μΌλ‘ λ§λ€κΈ° μνμ¬ (1) νΌμ³ νμ±λ (feature activation), (2) λͺ¨λΈμ νλΌλ―Έν°, κ·Έλ¦¬κ³ (3) λ€νΈμν¬μ ꡬ쑰λ₯Ό λ³νν μ μλλ‘ νλ μΈ κ°μ§μ μκ³ λ¦¬μ¦μ μ μνλ€. 첫 λ²μ§Έ μκ³ λ¦¬μ¦μ λ₯ μ κ²½λ§ λ€νΈμν¬μ λ΄λΆ νΌμ³ νμ±λμ ν¬κΈ°λ₯Ό κ°κ°μ μ
λ ₯ νλ μμ λ°λΌ μ μμ μΌλ‘ μ‘°μ νλλ‘ νλ©°, μ΄ν
μ
λͺ¨λΈμ νμ©νμ¬ μ νν νλ μ λ³΄κ° μ±λ₯μ μ»μ μ μμλ€. μ΅ν°μ»¬ νλ‘μ° μμΈ‘ λͺ¨λΈμ νμ©νμ¬ ν½μ
λ¨μλ‘ μμ§μ μ 보λ₯Ό μΆμΆν λλΆλΆμ κΈ°μ‘΄ λ°©μλ€κ³Ό λ¬λ¦¬, μ μν μ±λ μ΄ν
μ
κΈ°λ°μ λͺ¨λΈμ λ³λμ λͺ¨μ
λͺ¨λΈ μμ΄λ λ§€μ° μ νν μ€κ° νλ μμ μμ±ν μ μλ€. λ λ²μ§Έλ‘ μ μνλ μκ³ λ¦¬μ¦μ νλ μ λ³΄κ° λͺ¨λΈμ κ° νλΌλ―Έν° κ°μ μ μμ μΌλ‘ λ³κ²½ν μ μλλ‘ λ©νλ¬λ (meta-learning) λ°©λ²λ‘ μ μ¬μ©νλ€. κ°κ°μ μ
λ ₯ λΉλμ€ μνμ€λ§λ€ λͺ¨λΈμ νλΌλ―Έν° κ°μ μ μμ μΌλ‘ μ
λ°μ΄νΈν μ μλλ‘ νμ΅μμΌ μ€μΌλ‘μ¨, μ μν νλ μμν¬λ κΈ°μ‘΄μ μ΄λ€ νλ μ λ³΄κ° λͺ¨λΈμ μ¬μ©νλλΌλ λ¨ ν λ²μ κ·ΈλΌλμΈνΈ μ
λ°μ΄νΈλ₯Ό ν΅ν΄ μΌκ΄λ μ±λ₯ ν₯μμ 보μλ€. λ§μ§λ§μΌλ‘, μ
λ ₯μ λ°λΌ λ€νΈμν¬μ κ΅¬μ‘°κ° λμ μΌλ‘ λ³νλλ νλ μμν¬λ₯Ό μ μνμ¬ κ³΅κ°μ μΌλ‘ λΆν λ νλ μμ κ° μ§μλ§λ€ μλ‘ λ€λ₯Έ μΆλ‘ κ²½λ‘λ₯Ό ν΅κ³Όνκ³ , λΆνμν κ³μ°λμ μλΉ λΆλΆ μ€μΌ μ μλλ‘ νλ€. μ μνλ λμ λ€νΈμν¬λ μ
λ ₯ νλ μμ ν¬κΈ°μ νλ μ λ³΄κ° λͺ¨λΈμ κΉμ΄λ₯Ό μ‘°μ ν¨μΌλ‘μ¨ λ² μ΄μ€λΌμΈ λͺ¨λΈμ μ±λ₯μ μ μ§νλ©΄μ κ³μ° ν¨μ¨μ±μ ν¬κ² μ¦κ°νμλ€.
λ³Έ νμ λ
Όλ¬Έμμ μ μν μΈ κ°μ§μ μ μμ λ°©λ²λ‘ μ ν¨κ³Όλ λΉλμ€ νλ μ 보κ°λ²μ μν μ¬λ¬ λ²€μΉλ§ν¬ λ°μ΄ν°μ
μ λ©΄λ°νκ² νκ°λμλ€. νΉν, λ€μν νμ΄νΌνλΌλ―Έν° μΈν
κ³Ό μ¬λ¬ λ² μ΄μ€λΌμΈ λͺ¨λΈμ λν λΉκ΅, λΆμ μ€νμ ν΅ν΄ ν
μ€νΈ λ¨κ³μμμ μ μμ λ°©λ²λ‘ μ λν ν¨κ³Όλ₯Ό μ
μ¦νμλ€. μ΄λ λΉλμ€ νλ μ 보κ°λ²μ λν μ΅μ κ²°κ³Όλ€μ μΆκ°μ μΌλ‘ μ μ©λ μ μλ μλ‘μ΄ μ°κ΅¬ λ°©λ²μΌλ‘, μΆν λ€λ°©λ©΄μΌλ‘μ νμ₯μ±μ΄ κΈ°λλλ€.1 Introduction 1
1.1 Motivations 1
1.2 Proposed method 3
1.3 Contributions 5
1.4 Organization of dissertation 6
2 Feature Adaptation based Approach 7
2.1 Introduction 7
2.2 Related works 10
2.2.1 Video frame interpolation 10
2.2.2 Attention mechanism 12
2.3 Proposed Method 12
2.3.1 Overview of network architecture 13
2.3.2 Main components 14
2.3.3 Loss 16
2.4 Understanding our model 17
2.4.1 Internal feature visualization 18
2.4.2 Intermediate image reconstruction 21
2.5 Experiments 23
2.5.1 Datasets 23
2.5.2 Implementation details 25
2.5.3 Comparison to the state-of-the-art 26
2.5.4 Ablation study 36
2.6 Summary 38
3 Meta-Learning based Approach 39
3.1 Introduction 39
3.2 Related works 42
3.3 Proposed method 44
3.3.1 Video frame interpolation problem set-up 44
3.3.2 Exploiting extra information at test time 45
3.3.3 Background on MAML 48
3.3.4 MetaVFI: Meta-learning for frame interpolation 49
3.4 Experiments 54
3.4.1 Settings 54
3.4.2 Meta-learning algorithm selection 56
3.4.3 Video frame interpolation results 58
3.4.4 Ablation studies 66
3.5 Summary 69
4 Dynamic Architecture based Approach 71
4.1 Introduction 71
4.2 Related works 75
4.2.1 Video frame interpolation 75
4.2.2 Adaptive inference 76
4.3 Proposed Method 77
4.3.1 Dynamic framework overview 77
4.3.2 Scale and depth finder (SD-finder) 80
4.3.3 Dynamic interpolation model 82
4.3.4 Training 83
4.4 Experiments 85
4.4.1 Datasets 85
4.4.2 Implementation details 86
4.4.3 Quantitative comparison 87
4.4.4 Visual comparison 93
4.4.5 Ablation study 97
4.5 Summary 100
5 Conclusion 103
5.1 Summary of dissertation 103
5.2 Future works 104
Bibliography 107
κ΅λ¬Έμ΄λ‘ 120Docto
- β¦