325 research outputs found

    An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

    Full text link
    Video enhancement is a challenging problem, more than that of stills, mainly due to high computational cost, larger data volumes and the difficulty of achieving consistency in the spatio-temporal domain. In practice, these challenges are often coupled with the lack of example pairs, which inhibits the application of supervised learning strategies. To address these challenges, we propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples. In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information. The proposed design allows our recurrent cells to efficiently propagate spatio-temporal information across frames and reduces the need for high complexity networks. Our setting enables learning from unpaired videos in a cyclic adversarial manner, where the proposed recurrent units are employed in all architectures. Efficient training is accomplished by introducing one single discriminator that learns the joint distribution of source and target domain simultaneously. The enhancement results demonstrate clear superiority of the proposed video enhancer over the state-of-the-art methods, in all terms of visual quality, quantitative metrics, and inference speed. Notably, our video enhancer is capable of enhancing over 35 frames per second of FullHD video (1080x1920)

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Get PDF
    This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance

    Motion Offset for Blur Modeling

    Get PDF
    Motion blur caused by the relative movement between the camera and the subject is often an undesirable degradation of the image quality. In most conventional deblurring methods, a blur kernel is estimated for image deconvolution. Due to the ill-posed nature, predefined priors are proposed to suppress the ill-posedness. However, these predefined priors can only handle some specific situations. In order to achieve a better deblurring performance on dynamic scene, deep-learning based methods are proposed to learn a mapping function that restore the sharp image from a blurry image. The blur may be implicitly modelled in feature extraction module. However, the blur modelled from the paired dataset cannot be well generalized to some real-world scenes. To summary, an accurate and dynamic blur model that more closely approximates real-world blur is needed. By revisiting the principle of camera exposure, we can model the blur with the displacements between sharp pixels and the exposed pixel, namely motion offsets. Given specific physical constraints, motion offsets are able to form different exposure trajectories (i.e. linear, quadratic). Compare to conventional blur kernel, our proposed motion offsets are a more rigorous approximation for real-world blur, since they can constitute a non-linear and non-uniform motion field. Through learning from dynamic scene dataset, an accurate and spatial-variant motion offset field is obtained. With accurate motion information and a compact blur modeling method, we explore the ways of utilizing motion information to facilitate multiple blur-related tasks. By introducing recovered motion offsets, we build up a motion-aware and spatial-variant convolution. For extracting a video clip from a blurry image, motion offsets can provide an explicit (non-)linear motion trajectory for interpolating. We also work towards a better image deblurring performance in real-world scenarios by improving the generalization ability of the deblurring model

    Deep learning for accelerated magnetic resonance imaging

    Get PDF
    Medical imaging has aided the biggest advance in the medical domain in the last century. Whilst X-ray, CT, PET and ultrasound are a form of imaging that can be useful in particular scenarios, they each have disadvantages in cost, image quality, ease-of-use and ionising radiation. MRI is a slow imaging protocol which contributes to its high cost to run. However, MRI is a very versatile imaging protocol allowing images of varying contrast to be easily generated whilst not requiring the use of ionising radiation. If MRI can be made to be more efficient and smart, the effective cost of running MRI may be more affordable and accessible. The focus of this thesis is decreasing the acquisition time involved in MRI whilst maintaining the quality of the generated images and thus diagnosis. In particular, we focus on data-driven deep learning approaches that aid in the image reconstruction process and streamline the diagnostic process. We focus on three particular aspects of MR acquisition. Firstly, we investigate the use of motion estimation in the cine reconstruction process. Motion allows us to combine an abundance of imaging data in a learnt reconstruction model allowing acquisitions to be sped up by up to 50 times in extreme scenarios. Secondly, we investigate the possibility of using under-acquired MR data to generate smart diagnoses in the form of automated text reports. In particular, we investigate the possibility of skipping the imaging reconstruction phase altogether at inference time and instead, directly seek to generate radiological text reports for diffusion-weighted brain images in an effort to streamline the diagnostic process. Finally, we investigate the use of probabilistic modelling for MRI reconstruction without the use of fully-acquired data. In particular, we note that acquiring fully-acquired reference images in MRI can be difficult and nonetheless may still contain undesired artefacts that lead to degradation of the dataset and thus the training process. In this chapter, we investigate the possibility of performing reconstruction without fully-acquired references and furthermore discuss the possibility of generating higher quality outputs than that of the fully-acquired references.Open Acces

    λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간을 μœ„ν•œ ν…ŒμŠ€νŠΈ λ‹¨κ³„μ˜ 적응적 방법둠 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2021. 2. 이경무.Computationally handling videos has been one of the foremost goals in computer vision. In particular, analyzing the complex dynamics including motion and occlusion between two frames is of fundamental importance in understanding the visual contents of a video. Research on video frame interpolation, a problem where the goal is to synthesize high-quality intermediate frames between the two input frames, specifically investigates the low-level characteristics within the consecutive frames of a video. The topic has been recently gaining increased popularity and can be applied to various real-world applications such as generating slow-motion effects, novel view synthesis, or video stabilization. Existing methods for video frame interpolation aim to design complex new architectures to effectively estimate and compensate for the motion between two input frames. However, natural videos contain a wide variety of different scenarios, including foreground/background appearance and motion, frame rate, and occlusion. Therefore, even with a huge amount of training data, it is difficult for a single model to generalize well on all possible situations. This dissertation introduces novel methodologies for test-time adaptation for tackling the problem of video frame interpolation. In particular, I propose to enable three different aspects of the deep-learning-based framework to be adaptive: (1) feature activation, (2) network weights, and (3) architectural structures. Specifically, I first present how adaptively scaling the feature activations of a deep neural network with respect to each input frame using attention models allows for accurate interpolation. Unlike the previous approaches that heavily depend on optical flow estimation models, the proposed channel-attention-based model can achieve high-quality frame synthesis without explicit motion estimation. Then, meta-learning is employed for fast adaptation of the parameter values of the frame interpolation models. By learning to adapt for each input video clips, the proposed framework can consistently improve the performance of many existing models with just a single gradient update to its parameters. Lastly, I introduce an input-adaptive dynamic architecture that can assign different inference paths with respect to each local region of the input frames. By deciding the scaling factors of the inputs and the network depth of the early exit in the interpolation model, the dynamic framework can greatly improve the computational efficiency while maintaining, and sometimes even outperforming the performance of the baseline interpolation method. The effectiveness of the proposed test-time adaptation methodologies is extensively evaluated with multiple benchmark datasets for video frame interpolation. Thorough ablation studies with various hyperparameter settings and baseline networks also demonstrate the superiority of adaptation to the test-time inputs, which is a new research direction orthogonal to the other state-of-the-art frame interpolation approaches.κ³„μ‚°μ μœΌλ‘œ λΉ„λ””μ˜€ 데이터λ₯Ό μ²˜λ¦¬ν•˜λŠ” 것은 컴퓨터 λΉ„μ „ λΆ„μ•Όμ˜ μ€‘μš”ν•œ λͺ©ν‘œ 쀑 ν•˜λ‚˜μ΄κ³ , 이λ₯Ό μœ„ν•΄μ„  두 λΉ„λ””μ˜€ ν”„λ ˆμž„ μ‚¬μ΄μ˜ μ›€μ§μž„κ³Ό 가리어짐 λ“±μ˜ λ³΅μž‘ν•œ 정보λ₯Ό λΆ„μ„ν•˜λŠ” 것이 ν•„μˆ˜μ μ΄λ‹€. λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법은 두 μž…λ ₯ ν”„λ ˆμž„ μ‚¬μ΄μ˜ 쀑간 ν”„λ ˆμž„μ„ μ •ν™•ν•˜κ²Œ μƒμ„±ν•˜λŠ” 것을 λͺ©ν‘œλ‘œ ν•˜λŠ” 문제둜, μ—°μ†λœ λΉ„λ””μ˜€ ν”„λ ˆμž„ μ‚¬μ΄μ˜ μ •λ°€ν•œ (ν™”μ†Œ λ‹¨μœ„μ˜) νŠΉμ§•λ“€μ„ μ›€μ§μž„κ³Ό 가리어짐을 κ³ λ €ν•˜μ—¬ λΆ„μ„ν•˜λ„λ‘ μ—°κ΅¬λ˜μ—ˆλ‹€. 이 λΆ„μ•ΌλŠ” 슬둜우λͺ¨μ…˜ 효과 생성, λ‹€λ₯Έ μ‹œμ μ—μ„œ λ°”λΌλ³΄λŠ” 물체 생성, 손떨림 보정 λ“± μ‹€μƒν™œμ˜ λ‹€μ–‘ν•œ μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜μ— ν™œμš©λ  수 있기 λ•Œλ¬Έμ— μ΅œκ·Όμ— λ§Žμ€ 관심을 λ°›κ³  μžˆλ‹€. 기쑴의 방법듀은 두 μž…λ ₯ ν”„λ ˆμž„ μ‚¬μ΄μ˜ ν”½μ…€ λ‹¨μœ„ μ›€μ§μž„ 정보λ₯Ό 효과적으둜 μ˜ˆμΈ‘ν•˜κ³  λ³΄μ™„ν•˜λŠ” λ°©ν–₯으둜 μ—°κ΅¬λ˜μ–΄μ™”λ‹€. ν•˜μ§€λ§Œ μ‹€μ œ λΉ„λ””μ˜€ λ°μ΄ν„°λŠ” λ‹€μ–‘ν•œ 물체듀 및 λ³΅μž‘ν•œ 배경의 μ›€μ§μž„, 이에 λ”°λ₯Έ 가리어짐, λΉ„λ””μ˜€λ§ˆλ‹€ λ‹¬λΌμ§€λŠ” ν”„λ ˆμž„μœ¨ λ“± 맀우 λ‹€μ–‘ν•œ ν™˜κ²½μ„ λ‹΄κ³  μžˆλ‹€. λ”°λΌμ„œ ν•˜λ‚˜μ˜ λͺ¨λΈλ‘œ λͺ¨λ“  ν™˜κ²½μ— 일반적으둜 잘 λ™μž‘ν•˜λŠ” λͺ¨λΈμ„ ν•™μŠ΅ν•˜λŠ” 것은 μˆ˜λ§Žμ€ ν•™μŠ΅ 데이터λ₯Ό ν™œμš©ν•˜μ—¬λ„ 맀우 μ–΄λ €μš΄ λ¬Έμ œμ΄λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•œ ν…ŒμŠ€νŠΈ λ‹¨κ³„μ˜ 적응적 방법둠듀을 μ œμ‹œν•œλ‹€. 특히 λ”₯λŸ¬λ‹ 기반의 ν”„λ ˆμž„μ›Œν¬λ₯Ό μ μ‘μ μœΌλ‘œ λ§Œλ“€κΈ° μœ„ν•˜μ—¬ (1) 피쳐 ν™œμ„±λ„ (feature activation), (2) λͺ¨λΈμ˜ νŒŒλΌλ―Έν„°, 그리고 (3) λ„€νŠΈμ›Œν¬μ˜ ꡬ쑰λ₯Ό λ³€ν˜•ν•  수 μžˆλ„λ‘ ν•˜λŠ” μ„Έ κ°€μ§€μ˜ μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•œλ‹€. 첫 번째 μ•Œκ³ λ¦¬μ¦˜μ€ λ”₯ 신경망 λ„€νŠΈμ›Œν¬μ˜ λ‚΄λΆ€ 피쳐 ν™œμ„±λ„μ˜ 크기λ₯Ό 각각의 μž…λ ₯ ν”„λ ˆμž„μ— 따라 μ μ‘μ μœΌλ‘œ μ‘°μ ˆν•˜λ„λ‘ ν•˜λ©°, μ–΄ν…μ…˜ λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ μ •ν™•ν•œ ν”„λ ˆμž„ 보간 μ„±λŠ₯을 얻을 수 μžˆμ—ˆλ‹€. μ˜΅ν‹°μ»¬ ν”Œλ‘œμš° 예츑 λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ ν”½μ…€ λ‹¨μœ„λ‘œ μ›€μ§μž„ 정보λ₯Ό μΆ”μΆœν•œ λŒ€λΆ€λΆ„μ˜ κΈ°μ‘΄ 방식듀과 달리, μ œμ•ˆν•œ 채널 μ–΄ν…μ…˜ 기반의 λͺ¨λΈμ€ λ³„λ„μ˜ λͺ¨μ…˜ λͺ¨λΈ 없이도 맀우 μ •ν™•ν•œ 쀑간 ν”„λ ˆμž„μ„ 생성할 수 μžˆλ‹€. 두 번째둜 μ œμ•ˆν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ€ ν”„λ ˆμž„ 보간 λͺ¨λΈμ˜ 각 νŒŒλΌλ―Έν„° 값을 μ μ‘μ μœΌλ‘œ λ³€κ²½ν•  수 μžˆλ„λ‘ λ©”νƒ€λŸ¬λ‹ (meta-learning) 방법둠을 μ‚¬μš©ν•œλ‹€. 각각의 μž…λ ₯ λΉ„λ””μ˜€ μ‹œν€€μŠ€λ§ˆλ‹€ λͺ¨λΈμ˜ νŒŒλΌλ―Έν„° 값을 μ μ‘μ μœΌλ‘œ μ—…λ°μ΄νŠΈν•  수 μžˆλ„λ‘ ν•™μŠ΅μ‹œμΌœ 쀌으둜써, μ œμ‹œν•œ ν”„λ ˆμž„μ›Œν¬λŠ” 기쑴의 μ–΄λ–€ ν”„λ ˆμž„ 보간 λͺ¨λΈμ„ μ‚¬μš©ν•˜λ”λΌλ„ 단 ν•œ 번의 κ·ΈλΌλ””μ–ΈνŠΈ μ—…λ°μ΄νŠΈλ₯Ό 톡해 μΌκ΄€λœ μ„±λŠ₯ ν–₯상을 λ³΄μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μž…λ ₯에 따라 λ„€νŠΈμ›Œν¬μ˜ ꡬ쑰가 λ™μ μœΌλ‘œ λ³€ν˜•λ˜λŠ” ν”„λ ˆμž„μ›Œν¬λ₯Ό μ œμ‹œν•˜μ—¬ κ³΅κ°„μ μœΌλ‘œ λΆ„ν• λœ ν”„λ ˆμž„μ˜ 각 μ§€μ—­λ§ˆλ‹€ μ„œλ‘œ λ‹€λ₯Έ μΆ”λ‘  경둜λ₯Ό ν†΅κ³Όν•˜κ³ , λΆˆν•„μš”ν•œ κ³„μ‚°λŸ‰μ„ 상당 λΆ€λΆ„ 쀄일 수 μžˆλ„λ‘ ν•œλ‹€. μ œμ•ˆν•˜λŠ” 동적 λ„€νŠΈμ›Œν¬λŠ” μž…λ ₯ ν”„λ ˆμž„μ˜ 크기와 ν”„λ ˆμž„ 보간 λͺ¨λΈμ˜ 깊이λ₯Ό μ‘°μ ˆν•¨μœΌλ‘œμ¨ 베이슀라인 λͺ¨λΈμ˜ μ„±λŠ₯을 μœ μ§€ν•˜λ©΄μ„œ 계산 νš¨μœ¨μ„±μ„ 크게 μ¦κ°€ν•˜μ˜€λ‹€. λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•œ μ„Έ κ°€μ§€μ˜ 적응적 λ°©λ²•λ‘ μ˜ νš¨κ³ΌλŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법을 μœ„ν•œ μ—¬λŸ¬ 벀치마크 데이터셋에 λ©΄λ°€ν•˜κ²Œ ν‰κ°€λ˜μ—ˆλ‹€. 특히, λ‹€μ–‘ν•œ ν•˜μ΄νΌνŒŒλΌλ―Έν„° μ„ΈνŒ…κ³Ό μ—¬λŸ¬ 베이슀라인 λͺ¨λΈμ— λŒ€ν•œ 비ꡐ, 뢄석 μ‹€ν—˜μ„ 톡해 ν…ŒμŠ€νŠΈ λ‹¨κ³„μ—μ„œμ˜ 적응적 방법둠에 λŒ€ν•œ 효과λ₯Ό μž…μ¦ν•˜μ˜€λ‹€. μ΄λŠ” λΉ„λ””μ˜€ ν”„λ ˆμž„ 보간법에 λŒ€ν•œ μ΅œμ‹  결과듀에 μΆ”κ°€μ μœΌλ‘œ 적용될 수 μžˆλŠ” μƒˆλ‘œμš΄ 연ꡬ λ°©λ²•μœΌλ‘œ, μΆ”ν›„ λ‹€λ°©λ©΄μœΌλ‘œμ˜ ν™•μž₯성이 κΈ°λŒ€λœλ‹€.1 Introduction 1 1.1 Motivations 1 1.2 Proposed method 3 1.3 Contributions 5 1.4 Organization of dissertation 6 2 Feature Adaptation based Approach 7 2.1 Introduction 7 2.2 Related works 10 2.2.1 Video frame interpolation 10 2.2.2 Attention mechanism 12 2.3 Proposed Method 12 2.3.1 Overview of network architecture 13 2.3.2 Main components 14 2.3.3 Loss 16 2.4 Understanding our model 17 2.4.1 Internal feature visualization 18 2.4.2 Intermediate image reconstruction 21 2.5 Experiments 23 2.5.1 Datasets 23 2.5.2 Implementation details 25 2.5.3 Comparison to the state-of-the-art 26 2.5.4 Ablation study 36 2.6 Summary 38 3 Meta-Learning based Approach 39 3.1 Introduction 39 3.2 Related works 42 3.3 Proposed method 44 3.3.1 Video frame interpolation problem set-up 44 3.3.2 Exploiting extra information at test time 45 3.3.3 Background on MAML 48 3.3.4 MetaVFI: Meta-learning for frame interpolation 49 3.4 Experiments 54 3.4.1 Settings 54 3.4.2 Meta-learning algorithm selection 56 3.4.3 Video frame interpolation results 58 3.4.4 Ablation studies 66 3.5 Summary 69 4 Dynamic Architecture based Approach 71 4.1 Introduction 71 4.2 Related works 75 4.2.1 Video frame interpolation 75 4.2.2 Adaptive inference 76 4.3 Proposed Method 77 4.3.1 Dynamic framework overview 77 4.3.2 Scale and depth finder (SD-finder) 80 4.3.3 Dynamic interpolation model 82 4.3.4 Training 83 4.4 Experiments 85 4.4.1 Datasets 85 4.4.2 Implementation details 86 4.4.3 Quantitative comparison 87 4.4.4 Visual comparison 93 4.4.5 Ablation study 97 4.5 Summary 100 5 Conclusion 103 5.1 Summary of dissertation 103 5.2 Future works 104 Bibliography 107 ꡭ문초둝 120Docto
    • …
    corecore