5,681 research outputs found
A Comprehensive Review of Deep Learning-based Single Image Super-resolution
Image super-resolution (SR) is one of the vital image processing methods that
improve the resolution of an image in the field of computer vision. In the last
two decades, significant progress has been made in the field of
super-resolution, especially by utilizing deep learning methods. This survey is
an effort to provide a detailed survey of recent progress in single-image
super-resolution in the perspective of deep learning while also informing about
the initial classical methods used for image super-resolution. The survey
classifies the image SR methods into four categories, i.e., classical methods,
supervised learning-based methods, unsupervised learning-based methods, and
domain-specific SR methods. We also introduce the problem of SR to provide
intuition about image quality metrics, available reference datasets, and SR
challenges. Deep learning-based approaches of SR are evaluated using a
reference dataset. Some of the reviewed state-of-the-art image SR methods
include the enhanced deep SR network (EDSR), cycle-in-cycle GAN (CinCGAN),
multiscale residual network (MSRN), meta residual dense network (Meta-RDN),
recurrent back-projection network (RBPN), second-order attention network (SAN),
SR feedback network (SRFBN) and the wavelet-based residual attention network
(WRAN). Finally, this survey is concluded with future directions and trends in
SR and open problems in SR to be addressed by the researchers.Comment: 56 Pages, 11 Figures, 5 Table
Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution
Color-guided depth super-resolution (DSR) is an encouraging paradigm that
enhances a low-resolution (LR) depth map guided by an extra high-resolution
(HR) RGB image from the same scene. Existing methods usually use interpolation
to upscale the depth maps before feeding them into the network and transfer the
high-frequency information extracted from HR RGB images to guide the
reconstruction of depth maps. However, the extracted high-frequency information
usually contains textures that are not present in depth maps in the existence
of the cross-modality gap, and the noises would be further aggravated by
interpolation due to the resolution gap between the RGB and depth images. To
tackle these challenges, we propose a novel Symmetric Uncertainty-aware Feature
Transmission (SUFT) for color-guided DSR. (1) For the resolution gap, SUFT
builds an iterative up-and-down sampling pipeline, which makes depth features
and RGB features spatially consistent while suppressing noise amplification and
blurring by replacing common interpolated pre-upsampling. (2) For the
cross-modality gap, we propose a novel Symmetric Uncertainty scheme to remove
parts of RGB information harmful to the recovery of HR depth maps. Extensive
experiments on benchmark datasets and challenging real-world settings suggest
that our method achieves superior performance compared to state-of-the-art
methods. Our code and models are available at
https://github.com/ShiWuxuan/SUFT.Comment: 10 pages, 9 figures, accepted by the 30th ACM International
Conference on Multimedia (ACM MM 22
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey
Stereo matching is one of the longest-standing problems in computer vision
with close to 40 years of studies and research. Throughout the years the
paradigm has shifted from local, pixel-level decision to various forms of
discrete and continuous optimization to data-driven, learning-based methods.
Recently, the rise of machine learning and the rapid proliferation of deep
learning enhanced stereo matching with new exciting trends and applications
unthinkable until a few years ago. Interestingly, the relationship between
these two worlds is two-way. While machine, and especially deep, learning
advanced the state-of-the-art in stereo matching, stereo itself enabled new
ground-breaking methodologies such as self-supervised monocular depth
estimation based on deep networks. In this paper, we review recent research in
the field of learning-based depth estimation from single and binocular images
highlighting the synergies, the successes achieved so far and the open
challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial:
"Learning-based depth estimation from stereo and monocular images: successes,
limitations and future challenges"
(https://sites.google.com/view/cvpr-2019-depth-from-image/home
Generation of realistic skydome images
Generation of realistic skydome images We aim to generate realistic images of the sky with clouds using generative adversarial networks (GANs). We explore two GAN architectures, ProGAN and StyleGAN, and find that StyleGAN produces significantly better results. We also propose a novel architecture SuperGAN which aims to generate images at very high resolutions, which cannot be efficiently handled using state-of-art architectures. 1Generování realistických snímků obloh Naším cílem je generovat realistické obrázky oblohy s oblačností pomocí generativních kompetitivních sítí (GAN). Zkoumáme dvě architektury GANů, ProGAN a StyleGAN, a zjišťujeme, že StyleGAN dosahuje významně lepších výsledků. Pro generování obrázků ve velmi vysokém rozlišení, které nemůže být efektivně zpracováno soudobými architekturami GANů, navrhujeme novou architekturu SuperGAN. 1Department of Software and Computer Science EducationKatedra softwaru a výuky informatikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic
Medical Image Segmentation Review: The success of U-Net
Automatic medical image segmentation is a crucial topic in the medical domain
and successively a critical counterpart in the computer-aided diagnosis
paradigm. U-Net is the most widespread image segmentation architecture due to
its flexibility, optimized modular design, and success in all medical image
modalities. Over the years, the U-Net model achieved tremendous attention from
academic and industrial researchers. Several extensions of this network have
been proposed to address the scale and complexity created by medical tasks.
Addressing the deficiency of the naive U-Net model is the foremost step for
vendors to utilize the proper U-Net variant model for their business. Having a
compendium of different variants in one place makes it easier for builders to
identify the relevant research. Also, for ML researchers it will help them
understand the challenges of the biological tasks that challenge the model. To
address this, we discuss the practical aspects of the U-Net model and suggest a
taxonomy to categorize each network variant. Moreover, to measure the
performance of these strategies in a clinical application, we propose fair
evaluations of some unique and famous designs on well-known datasets. We
provide a comprehensive implementation library with trained models for future
research. In addition, for ease of future studies, we created an online list of
U-Net papers with their possible official implementation. All information is
gathered in https://github.com/NITR098/Awesome-U-Net repository.Comment: Submitted to the IEEE Transactions on Pattern Analysis and Machine
Intelligence Journa
- …