Search CORE

325 research outputs found

An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

Author: Fuoli Dario
Huang Zhiwu
Paudel Danda Pani
Timofte Radu
Van Gool Luc
Publication venue
Publication date: 23/12/2020
Field of study

Video enhancement is a challenging problem, more than that of stills, mainly due to high computational cost, larger data volumes and the difficulty of achieving consistency in the spatio-temporal domain. In practice, these challenges are often coupled with the lack of example pairs, which inhibits the application of supervised learning strategies. To address these challenges, we propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples. In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information. The proposed design allows our recurrent cells to efficiently propagate spatio-temporal information across frames and reduces the need for high complexity networks. Our setting enables learning from unpaired videos in a cyclic adversarial manner, where the proposed recurrent units are employed in all architectures. Efficient training is accomplished by introducing one single discriminator that learns the joint distribution of source and target domain simultaneously. The enhancement results demonstrate clear superiority of the proposed video enhancer over the state-of-the-art methods, in all terms of visual quality, quantitative metrics, and inference speed. Notably, our video enhancer is capable of enhancing over 35 frames per second of FullHD video (1080x1920)

arXiv.org e-Print Archive

Repository for Publications and Research Data

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Author: Azadeh Mansouri
Chunzheng Zhu
Guangtao Zhai
Hanene Brachemi Meftah
Hang Shi
Haoning Wu
Haotian Fan
Heng Cong
Hongye Liu
Ironhead Chuang
Kai Zhang
Kai Zhao
Mirko Agarla
Radu Timofte
Shiling Zhao
Shiqi Zhou
Tengchuan Kou
Tengfei Shi
Wei Sun
Wenqi Wang
Xiaohong Liu
Xiongkuo Min
Yilin Li
Yixuan Gao
Yu Lai
Yulun Zhang
Yunlong Dong
Yuqin Cao
Zhiliang Ma
Zhiwei Huang
Ziheng Jia
Publication venue: IEEE/CVF
Publication date: 01/01/2023
Field of study

This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Motion Offset for Blur Modeling

Author: Zhang Youjian
Publication venue: 'Journal of the Faculty of Engineering and Architecture of Gazi University'
Publication date: 01/01/2023
Field of study

Motion blur caused by the relative movement between the camera and the subject is often an undesirable degradation of the image quality. In most conventional deblurring methods, a blur kernel is estimated for image deconvolution. Due to the ill-posed nature, predefined priors are proposed to suppress the ill-posedness. However, these predefined priors can only handle some specific situations. In order to achieve a better deblurring performance on dynamic scene, deep-learning based methods are proposed to learn a mapping function that restore the sharp image from a blurry image. The blur may be implicitly modelled in feature extraction module. However, the blur modelled from the paired dataset cannot be well generalized to some real-world scenes. To summary, an accurate and dynamic blur model that more closely approximates real-world blur is needed. By revisiting the principle of camera exposure, we can model the blur with the displacements between sharp pixels and the exposed pixel, namely motion offsets. Given specific physical constraints, motion offsets are able to form different exposure trajectories (i.e. linear, quadratic). Compare to conventional blur kernel, our proposed motion offsets are a more rigorous approximation for real-world blur, since they can constitute a non-linear and non-uniform motion field. Through learning from dynamic scene dataset, an accurate and spatial-variant motion offset field is obtained. With accurate motion information and a compact blur modeling method, we explore the ways of utilizing motion information to facilitate multiple blur-related tasks. By introducing recovered motion offsets, we build up a motion-aware and spatial-variant convolution. For extracting a video clip from a blurry image, motion offsets can provide an explicit (non-)linear motion trajectory for interpolating. We also work towards a better image deblurring performance in real-world scenarios by improving the generalization ability of the deblurring model

Sydney eScholarship

Deep learning for accelerated magnetic resonance imaging

Author: Seegoolam Krishna Gavindrajee
Publication venue: Computing, Imperial College London
Publication date: 01/07/2023
Field of study

Medical imaging has aided the biggest advance in the medical domain in the last century. Whilst X-ray, CT, PET and ultrasound are a form of imaging that can be useful in particular scenarios, they each have disadvantages in cost, image quality, ease-of-use and ionising radiation. MRI is a slow imaging protocol which contributes to its high cost to run. However, MRI is a very versatile imaging protocol allowing images of varying contrast to be easily generated whilst not requiring the use of ionising radiation. If MRI can be made to be more efficient and smart, the effective cost of running MRI may be more affordable and accessible. The focus of this thesis is decreasing the acquisition time involved in MRI whilst maintaining the quality of the generated images and thus diagnosis. In particular, we focus on data-driven deep learning approaches that aid in the image reconstruction process and streamline the diagnostic process. We focus on three particular aspects of MR acquisition. Firstly, we investigate the use of motion estimation in the cine reconstruction process. Motion allows us to combine an abundance of imaging data in a learnt reconstruction model allowing acquisitions to be sped up by up to 50 times in extreme scenarios. Secondly, we investigate the possibility of using under-acquired MR data to generate smart diagnoses in the form of automated text reports. In particular, we investigate the possibility of skipping the imaging reconstruction phase altogether at inference time and instead, directly seek to generate radiological text reports for diffusion-weighted brain images in an effort to streamline the diagnostic process. Finally, we investigate the use of probabilistic modelling for MRI reconstruction without the use of fully-acquired data. In particular, we note that acquiring fully-acquired reference images in MRI can be difficult and nonetheless may still contain undesired artefacts that lead to degradation of the dataset and thus the training process. In this chapter, we investigate the possibility of performing reconstruction without fully-acquired references and furthermore discuss the possibility of generating higher quality outputs than that of the fully-acquired references.Open Acces

Spiral - Imperial College Digital Repository

비디오 프레임 보간을 위한 테스트 단계의 적응적 방법론 연구

Author: 최명섭
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2021. 2. 이경무.Computationally handling videos has been one of the foremost goals in computer vision. In particular, analyzing the complex dynamics including motion and occlusion between two frames is of fundamental importance in understanding the visual contents of a video. Research on video frame interpolation, a problem where the goal is to synthesize high-quality intermediate frames between the two input frames, specifically investigates the low-level characteristics within the consecutive frames of a video. The topic has been recently gaining increased popularity and can be applied to various real-world applications such as generating slow-motion effects, novel view synthesis, or video stabilization. Existing methods for video frame interpolation aim to design complex new architectures to effectively estimate and compensate for the motion between two input frames. However, natural videos contain a wide variety of different scenarios, including foreground/background appearance and motion, frame rate, and occlusion. Therefore, even with a huge amount of training data, it is difficult for a single model to generalize well on all possible situations. This dissertation introduces novel methodologies for test-time adaptation for tackling the problem of video frame interpolation. In particular, I propose to enable three different aspects of the deep-learning-based framework to be adaptive: (1) feature activation, (2) network weights, and (3) architectural structures. Specifically, I first present how adaptively scaling the feature activations of a deep neural network with respect to each input frame using attention models allows for accurate interpolation. Unlike the previous approaches that heavily depend on optical flow estimation models, the proposed channel-attention-based model can achieve high-quality frame synthesis without explicit motion estimation. Then, meta-learning is employed for fast adaptation of the parameter values of the frame interpolation models. By learning to adapt for each input video clips, the proposed framework can consistently improve the performance of many existing models with just a single gradient update to its parameters. Lastly, I introduce an input-adaptive dynamic architecture that can assign different inference paths with respect to each local region of the input frames. By deciding the scaling factors of the inputs and the network depth of the early exit in the interpolation model, the dynamic framework can greatly improve the computational efficiency while maintaining, and sometimes even outperforming the performance of the baseline interpolation method. The effectiveness of the proposed test-time adaptation methodologies is extensively evaluated with multiple benchmark datasets for video frame interpolation. Thorough ablation studies with various hyperparameter settings and baseline networks also demonstrate the superiority of adaptation to the test-time inputs, which is a new research direction orthogonal to the other state-of-the-art frame interpolation approaches.계산적으로 비디오 데이터를 처리하는 것은 컴퓨터 비전 분야의 중요한 목표 중 하나이고, 이를 위해선 두 비디오 프레임 사이의 움직임과 가리어짐 등의 복잡한 정보를 분석하는 것이 필수적이다. 비디오 프레임 보간법은 두 입력 프레임 사이의 중간 프레임을 정확하게 생성하는 것을 목표로 하는 문제로, 연속된 비디오 프레임 사이의 정밀한 (화소 단위의) 특징들을 움직임과 가리어짐을 고려하여 분석하도록 연구되었다. 이 분야는 슬로우모션 효과 생성, 다른 시점에서 바라보는 물체 생성, 손떨림 보정 등 실생활의 다양한 어플리케이션에 활용될 수 있기 때문에 최근에 많은 관심을 받고 있다. 기존의 방법들은 두 입력 프레임 사이의 픽셀 단위 움직임 정보를 효과적으로 예측하고 보완하는 방향으로 연구되어왔다. 하지만 실제 비디오 데이터는 다양한 물체들 및 복잡한 배경의 움직임, 이에 따른 가리어짐, 비디오마다 달라지는 프레임율 등 매우 다양한 환경을 담고 있다. 따라서 하나의 모델로 모든 환경에 일반적으로 잘 동작하는 모델을 학습하는 것은 수많은 학습 데이터를 활용하여도 매우 어려운 문제이다. 본 학위 논문에서는 비디오 프레임 보간 문제를 해결하기 위한 테스트 단계의 적응적 방법론들을 제시한다. 특히 딥러닝 기반의 프레임워크를 적응적으로 만들기 위하여 (1) 피쳐 활성도 (feature activation), (2) 모델의 파라미터, 그리고 (3) 네트워크의 구조를 변형할 수 있도록 하는 세 가지의 알고리즘을 제안한다. 첫 번째 알고리즘은 딥 신경망 네트워크의 내부 피쳐 활성도의 크기를 각각의 입력 프레임에 따라 적응적으로 조절하도록 하며, 어텐션 모델을 활용하여 정확한 프레임 보간 성능을 얻을 수 있었다. 옵티컬 플로우 예측 모델을 활용하여 픽셀 단위로 움직임 정보를 추출한 대부분의 기존 방식들과 달리, 제안한 채널 어텐션 기반의 모델은 별도의 모션 모델 없이도 매우 정확한 중간 프레임을 생성할 수 있다. 두 번째로 제안하는 알고리즘은 프레임 보간 모델의 각 파라미터 값을 적응적으로 변경할 수 있도록 메타러닝 (meta-learning) 방법론을 사용한다. 각각의 입력 비디오 시퀀스마다 모델의 파라미터 값을 적응적으로 업데이트할 수 있도록 학습시켜 줌으로써, 제시한 프레임워크는 기존의 어떤 프레임 보간 모델을 사용하더라도 단 한 번의 그라디언트 업데이트를 통해 일관된 성능 향상을 보였다. 마지막으로, 입력에 따라 네트워크의 구조가 동적으로 변형되는 프레임워크를 제시하여 공간적으로 분할된 프레임의 각 지역마다 서로 다른 추론 경로를 통과하고, 불필요한 계산량을 상당 부분 줄일 수 있도록 한다. 제안하는 동적 네트워크는 입력 프레임의 크기와 프레임 보간 모델의 깊이를 조절함으로써 베이스라인 모델의 성능을 유지하면서 계산 효율성을 크게 증가하였다. 본 학위 논문에서 제안한 세 가지의 적응적 방법론의 효과는 비디오 프레임 보간법을 위한 여러 벤치마크 데이터셋에 면밀하게 평가되었다. 특히, 다양한 하이퍼파라미터 세팅과 여러 베이스라인 모델에 대한 비교, 분석 실험을 통해 테스트 단계에서의 적응적 방법론에 대한 효과를 입증하였다. 이는 비디오 프레임 보간법에 대한 최신 결과들에 추가적으로 적용될 수 있는 새로운 연구 방법으로, 추후 다방면으로의 확장성이 기대된다.1 Introduction 1 1.1 Motivations 1 1.2 Proposed method 3 1.3 Contributions 5 1.4 Organization of dissertation 6 2 Feature Adaptation based Approach 7 2.1 Introduction 7 2.2 Related works 10 2.2.1 Video frame interpolation 10 2.2.2 Attention mechanism 12 2.3 Proposed Method 12 2.3.1 Overview of network architecture 13 2.3.2 Main components 14 2.3.3 Loss 16 2.4 Understanding our model 17 2.4.1 Internal feature visualization 18 2.4.2 Intermediate image reconstruction 21 2.5 Experiments 23 2.5.1 Datasets 23 2.5.2 Implementation details 25 2.5.3 Comparison to the state-of-the-art 26 2.5.4 Ablation study 36 2.6 Summary 38 3 Meta-Learning based Approach 39 3.1 Introduction 39 3.2 Related works 42 3.3 Proposed method 44 3.3.1 Video frame interpolation problem set-up 44 3.3.2 Exploiting extra information at test time 45 3.3.3 Background on MAML 48 3.3.4 MetaVFI: Meta-learning for frame interpolation 49 3.4 Experiments 54 3.4.1 Settings 54 3.4.2 Meta-learning algorithm selection 56 3.4.3 Video frame interpolation results 58 3.4.4 Ablation studies 66 3.5 Summary 69 4 Dynamic Architecture based Approach 71 4.1 Introduction 71 4.2 Related works 75 4.2.1 Video frame interpolation 75 4.2.2 Adaptive inference 76 4.3 Proposed Method 77 4.3.1 Dynamic framework overview 77 4.3.2 Scale and depth finder (SD-finder) 80 4.3.3 Dynamic interpolation model 82 4.3.4 Training 83 4.4 Experiments 85 4.4.1 Datasets 85 4.4.2 Implementation details 86 4.4.3 Quantitative comparison 87 4.4.4 Visual comparison 93 4.4.5 Ablation study 97 4.5 Summary 100 5 Conclusion 103 5.1 Summary of dissertation 103 5.2 Future works 104 Bibliography 107 국문초록 120Docto

SNU Open Repository and Archive