803 research outputs found
Physics-Informed Computer Vision: A Review and Perspectives
Incorporation of physical information in machine learning frameworks are
opening and transforming many application domains. Here the learning process is
augmented through the induction of fundamental knowledge and governing physical
laws. In this work we explore their utility for computer vision tasks in
interpreting and understanding visual data. We present a systematic literature
review of formulation and approaches to computer vision tasks guided by
physical laws. We begin by decomposing the popular computer vision pipeline
into a taxonomy of stages and investigate approaches to incorporate governing
physical equations in each stage. Existing approaches in each task are analyzed
with regard to what governing physical processes are modeled, formulated and
how they are incorporated, i.e. modify data (observation bias), modify networks
(inductive bias), and modify losses (learning bias). The taxonomy offers a
unified view of the application of the physics-informed capability,
highlighting where physics-informed learning has been conducted and where the
gaps and opportunities are. Finally, we highlight open problems and challenges
to inform future research. While still in its early days, the study of
physics-informed computer vision has the promise to develop better computer
vision models that can improve physical plausibility, accuracy, data efficiency
and generalization in increasingly realistic applications
A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal
Face Restoration (FR) aims to restore High-Quality (HQ) faces from
Low-Quality (LQ) input images, which is a domain-specific image restoration
problem in the low-level computer vision area. The early face restoration
methods mainly use statistic priors and degradation models, which are difficult
to meet the requirements of real-world applications in practice. In recent
years, face restoration has witnessed great progress after stepping into the
deep learning era. However, there are few works to study deep learning-based
face restoration methods systematically. Thus, this paper comprehensively
surveys recent advances in deep learning techniques for face restoration.
Specifically, we first summarize different problem formulations and analyze the
characteristic of the face image. Second, we discuss the challenges of face
restoration. Concerning these challenges, we present a comprehensive review of
existing FR methods, including prior based methods and deep learning-based
methods. Then, we explore developed techniques in the task of FR covering
network architectures, loss functions, and benchmark datasets. We also conduct
a systematic benchmark evaluation on representative methods. Finally, we
discuss future directions, including network designs, metrics, benchmark
datasets, applications,etc. We also provide an open-source repository for all
the discussed methods, which is available at
https://github.com/TaoWangzj/Awesome-Face-Restoration.Comment: 21 pages, 19 figure
Online Streaming Video Super-Resolution with Convolutional Look-Up Table
Online video streaming has fundamental limitations on the transmission
bandwidth and computational capacity and super-resolution is a promising
potential solution. However, applying existing video super-resolution methods
to online streaming is non-trivial. Existing video codecs and streaming
protocols (\eg, WebRTC) dynamically change the video quality both spatially and
temporally, which leads to diverse and dynamic degradations. Furthermore,
online streaming has a strict requirement for latency that most existing
methods are less applicable. As a result, this paper focuses on the rarely
exploited problem setting of online streaming video super resolution. To
facilitate the research on this problem, a new benchmark dataset named
LDV-WebRTC is constructed based on a real-world online streaming system.
Leveraging the new benchmark dataset, we proposed a novel method specifically
for online video streaming, which contains a convolution and Look-Up Table
(LUT) hybrid model to achieve better performance-latency trade-off. To tackle
the changing degradations, we propose a mixture-of-expert-LUT module, where a
set of LUT specialized in different degradations are built and adaptively
combined to handle different degradations. Experiments show our method achieves
720P video SR around 100 FPS, while significantly outperforms existing
LUT-based methods and offers competitive performance compared to efficient
CNN-based methods
Adaptive Density Estimation for Generative Models
Unsupervised learning of generative models has seen tremendous progress over
recent years, in particular due to generative adversarial networks (GANs),
variational autoencoders, and flow-based models. GANs have dramatically
improved sample quality, but suffer from two drawbacks: (i) they mode-drop,
i.e., do not cover the full support of the train data, and (ii) they do not
allow for likelihood evaluations on held-out data. In contrast,
likelihood-based training encourages models to cover the full support of the
train data, but yields poorer samples. These mutual shortcomings can in
principle be addressed by training generative latent variable models in a
hybrid adversarial-likelihood manner. However, we show that commonly made
parametric assumptions create a conflict between them, making successful hybrid
models non trivial. As a solution, we propose to use deep invertible
transformations in the latent variable decoder. This approach allows for
likelihood computations in image space, is more efficient than fully invertible
models, and can take full advantage of adversarial training. We show that our
model significantly improves over existing hybrid models: offering GAN-like
samples, IS and FID scores that are competitive with fully adversarial models,
and improved likelihood scores
- …