Search CORE

5,681 research outputs found

A Comprehensive Review of Deep Learning-based Single Image Super-resolution

Author: Bashir Syed Muhammad Arsalan
Khan Mahrukh
Niu Yilong
Wang Yi
Publication venue: 'PeerJ'
Publication date: 01/07/2021
Field of study

Image super-resolution (SR) is one of the vital image processing methods that improve the resolution of an image in the field of computer vision. In the last two decades, significant progress has been made in the field of super-resolution, especially by utilizing deep learning methods. This survey is an effort to provide a detailed survey of recent progress in single-image super-resolution in the perspective of deep learning while also informing about the initial classical methods used for image super-resolution. The survey classifies the image SR methods into four categories, i.e., classical methods, supervised learning-based methods, unsupervised learning-based methods, and domain-specific SR methods. We also introduce the problem of SR to provide intuition about image quality metrics, available reference datasets, and SR challenges. Deep learning-based approaches of SR are evaluated using a reference dataset. Some of the reviewed state-of-the-art image SR methods include the enhanced deep SR network (EDSR), cycle-in-cycle GAN (CinCGAN), multiscale residual network (MSRN), meta residual dense network (Meta-RDN), recurrent back-projection network (RBPN), second-order attention network (SAN), SR feedback network (SRFBN) and the wavelet-based residual attention network (WRAN). Finally, this survey is concluded with future directions and trends in SR and open problems in SR to be addressed by the researchers.Comment: 56 Pages, 11 Figures, 5 Table

arXiv.org e-Print Archive

Directory of Open Access Journals

Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution

Author: Du Bo
Shi Wuxuan
Ye Mang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2023
Field of study

Color-guided depth super-resolution (DSR) is an encouraging paradigm that enhances a low-resolution (LR) depth map guided by an extra high-resolution (HR) RGB image from the same scene. Existing methods usually use interpolation to upscale the depth maps before feeding them into the network and transfer the high-frequency information extracted from HR RGB images to guide the reconstruction of depth maps. However, the extracted high-frequency information usually contains textures that are not present in depth maps in the existence of the cross-modality gap, and the noises would be further aggravated by interpolation due to the resolution gap between the RGB and depth images. To tackle these challenges, we propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR. (1) For the resolution gap, SUFT builds an iterative up-and-down sampling pipeline, which makes depth features and RGB features spatially consistent while suppressing noise amplification and blurring by replacing common interpolated pre-upsampling. (2) For the cross-modality gap, we propose a novel Symmetric Uncertainty scheme to remove parts of RGB information harmful to the recovery of HR depth maps. Extensive experiments on benchmark datasets and challenging real-world settings suggest that our method achieves superior performance compared to state-of-the-art methods. Our code and models are available at https://github.com/ShiWuxuan/SUFT.Comment: 10 pages, 9 figures, accepted by the 30th ACM International Conference on Multimedia (ACM MM 22

arXiv.org e-Print Archive

Pedestrian Attribute Recognition: A Survey

Author: Luo Bin
Tang Jin
Wang Xiao
Yang Rui
Zheng Shaofei
Publication venue
Publication date: 22/01/2019
Field of study

Recognizing pedestrian attributes is an important task in computer vision community due to it plays an important role in video surveillance. Many algorithms has been proposed to handle this task. The goal of this paper is to review existing works using traditional methods or based on deep learning networks. Firstly, we introduce the background of pedestrian attributes recognition (PAR, for short), including the fundamental concepts of pedestrian attributes and corresponding challenges. Secondly, we introduce existing benchmarks, including popular datasets and evaluation criterion. Thirdly, we analyse the concept of multi-task learning and multi-label learning, and also explain the relations between these two learning algorithms and pedestrian attribute recognition. We also review some popular network architectures which have widely applied in the deep learning community. Fourthly, we analyse popular solutions for this task, such as attributes group, part-based, \emph{etc}. Fifthly, we shown some applications which takes pedestrian attributes into consideration and achieve better performance. Finally, we summarized this paper and give several possible research directions for pedestrian attributes recognition. The project page of this paper can be found from the following website: \url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey: https://sites.google.com/view/ahu-pedestrianattributes

arXiv.org e-Print Archive

On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey

Author: Batsos Konstantinos
Mattoccia Stefano
Mordohai Philippos
Poggi Matteo
Tosi Fabio
Publication venue
Publication date: 31/03/2021
Field of study

Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future.Comment: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial: "Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges" (https://sites.google.com/view/cvpr-2019-depth-from-image/home

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Generation of realistic skydome images

Author: Špaček Jan
Publication venue: Univerzita Karlova, Matematicko-fyzikální fakulta
Publication date: 01/01/2020
Field of study

Generation of realistic skydome images We aim to generate realistic images of the sky with clouds using generative adversarial networks (GANs). We explore two GAN architectures, ProGAN and StyleGAN, and find that StyleGAN produces significantly better results. We also propose a novel architecture SuperGAN which aims to generate images at very high resolutions, which cannot be efficiently handled using state-of-art architectures. 1Generování realistických snímků obloh Naším cílem je generovat realistické obrázky oblohy s oblačností pomocí generativních kompetitivních sítí (GAN). Zkoumáme dvě architektury GANů, ProGAN a StyleGAN, a zjišťujeme, že StyleGAN dosahuje významně lepších výsledků. Pro generování obrázků ve velmi vysokém rozlišení, které nemůže být efektivně zpracováno soudobými architekturami GANů, navrhujeme novou architekturu SuperGAN. 1Department of Software and Computer Science EducationKatedra softwaru a výuky informatikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic

CU Digital Repository

Medical Image Segmentation Review: The success of U-Net

Author: Adeli Ehsan
Aghdam Ehsan Khodapanah
Avval Atlas Haddadi
Azad Reza
Bozorgpour Afshin
Cohen Joseph Paul
Jia Yiwei
Karimijafarbigloo Sanaz
Merhof Dorit
Rauland Amelie
Publication venue
Publication date: 27/11/2022
Field of study

Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and industrial researchers. Several extensions of this network have been proposed to address the scale and complexity created by medical tasks. Addressing the deficiency of the naive U-Net model is the foremost step for vendors to utilize the proper U-Net variant model for their business. Having a compendium of different variants in one place makes it easier for builders to identify the relevant research. Also, for ML researchers it will help them understand the challenges of the biological tasks that challenge the model. To address this, we discuss the practical aspects of the U-Net model and suggest a taxonomy to categorize each network variant. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. We provide a comprehensive implementation library with trained models for future research. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation. All information is gathered in https://github.com/NITR098/Awesome-U-Net repository.Comment: Submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence Journa

arXiv.org e-Print Archive