11 research outputs found
A Novel Domain Transfer-Based Approach for Unsupervised Thermal Image Super-Resolution
This paper presents a transfer domain strategy to tackle the limitations of low-resolution thermal sensors and generate higher-resolution images of reasonable quality. The proposed technique employs a CycleGAN architecture and uses a ResNet as an encoder in the generator along with an attention module and a novel loss function. The network is trained on a multi-resolution thermal image dataset acquired with three different thermal sensors. Results report better performance benchmarking results on the 2nd CVPR-PBVS-2021 thermal image super-resolution challenge than state-of-the-art methods. The code of this work is available online
LadleNet: Translating Thermal Infrared Images to Visible Light Images Using A Scalable Two-stage U-Net
The translation of thermal infrared (TIR) images to visible light (VI) images
presents a challenging task with potential applications spanning various
domains such as TIR-VI image registration and fusion. Leveraging supplementary
information derived from TIR image conversions can significantly enhance model
performance and generalization across these applications. However, prevailing
issues within this field include suboptimal image fidelity and limited model
scalability. In this paper, we introduce an algorithm, LadleNet, based on the
U-Net architecture. LadleNet employs a two-stage U-Net concatenation structure,
augmented with skip connections and refined feature aggregation techniques,
resulting in a substantial enhancement in model performance. Comprising
'Handle' and 'Bowl' modules, LadleNet's Handle module facilitates the
construction of an abstract semantic space, while the Bowl module decodes this
semantic space to yield mapped VI images. The Handle module exhibits
extensibility by allowing the substitution of its network architecture with
semantic segmentation networks, thereby establishing more abstract semantic
spaces to bolster model performance. Consequently, we propose LadleNet+, which
replaces LadleNet's Handle module with the pre-trained DeepLabv3+ network,
thereby endowing the model with enhanced semantic space construction
capabilities. The proposed method is evaluated and tested on the KAIST dataset,
accompanied by quantitative and qualitative analyses. Compared to existing
methodologies, our approach achieves state-of-the-art performance in terms of
image clarity and perceptual quality. The source code will be made available at
https://github.com/Ach-1914/LadleNet/tree/main/
Converting Optical Videos to Infrared Videos Using Attention GAN and Its Impact on Target Detection and Classification Performance
To apply powerful deep-learning-based algorithms for object detection and classification in infrared videos, it is necessary to have more training data in order to build high-performance models. However, in many surveillance applications, one can have a lot more optical videos than infrared videos. This lack of IR video datasets can be mitigated if optical-to-infrared video conversion is possible. In this paper, we present a new approach for converting optical videos to infrared videos using deep learning. The basic idea is to focus on target areas using attention generative adversarial network (attention GAN), which will preserve the fidelity of target areas. The approach does not require paired images. The performance of the proposed attention GAN has been demonstrated using objective and subjective evaluations. Most importantly, the impact of attention GAN has been demonstrated in improved target detection and classification performance using real-infrared videos
Cyclic Style Generative Adversarial Network for Near Infrared and Visible Light Face Recognition
Face recognition in the visible light (VIS) spectrum has been widely utilized in many practical applications. With the development of the deep learning method, the recognition accuracy and speed have already reached an excellent level, where face recognition can be applied in various circumstances. However, in some extreme situations, there are still problems that face recognition cannot guarantee performance. One of the most significant cases is under poor illumination. Lacking light sources, images cannot show the true identities of detected people. To address such a problem, the near infrared (NIR) spectrum offers an alternative solution to face recognition in which face images can be captured clearly. Studies have been made in recent years, and current near infrared and visible light (NIR-VIS) face recognition methods have achieved great performance.
In this thesis, I review current NIR-VIS face recognition methods and public NIR-VIS face datasets. I first list public NIR-VIS face datasets that are used in most research. For each dataset, I represent their characteristics, including the number of subjects, collection environment, resolution of images, and whether paired or not. Also, I conclude evaluation protocols for each dataset, helping with further analyzing of performances. Then, I classify current NIR-VIS face recognition methods into three categories, image synthesis-based methods, subspace learning-based methods, and invariant feature-based methods. The contribution of each method is concisely explained. Additionally, I make comparisons between current NIR-VIS face recognition methods and propose my own opinion on the advantages and disadvantages of these methods.
To improve the shortcomings of current methods, this thesis proposes a new model, Cyclic Style Generative Adversarial Network (CS-GAN), which is a combination of image synthesis-based method and subspace learning-based method. The proposed CS-GAN improves the visualization results of image synthesis between the NIR domain and VIS domain as well as recognition accuracy. The CS-GAN is based on the Style-GAN 3 network which was proposed in 2021. In the proposed model, there are two generators from pre-trained Style-GAN 3 which generate images in the NIR domain and VIS domain, respectively. The generators consist of a mapping network and synthesis network, where the mapping network disentangles the latent code for reducing correlation between features, and the synthesis network synthesizes face images through progressive growing training. The generators have different final layers, a to-RGB layer for the VIS domain and a to-grayscale layer for the NIR domain. Generators are embedded in a cyclic structure, in which latent codes are sent into the synthesis network in the other generator for recreated images, and recreated images are compared with real images which in the same domain to ensure domain consistency. Besides, I apply the proposed cyclic subspace learning. The cyclic subspace learning is composed of two parts. The first part introduces the proposed latent loss which is to have better controls over the learning of latent subspace. The latent codes influence both details and locations of features through continuously inputting into the synthesis network. The control over latent subspace can strengthen the feature consistency between synthesized images. And the second part improves the style-transferring process by controlling high-level features with perceptual loss in each domain. In the perceptual loss, there is a pre-trained VGG-16 network to extract high-level features which can be regarded as the style of the images. Therefore, style loss can control the style of images in both domains as well as ensure style consistency between synthesized images and real images. The visualization results show that the proposed CS-GAN model can synthesize better VIS images that are detailed, corrected colorized, and with clear edges. More importantly, the experimental results show that the Rank-1 accuracy on CASISA NIR-VIS 2.0 database reaches 99.60% which improves state-of-the-art methods by 0.2%
Unsupervised Hyperspectral and Multispectral Images Fusion Based on the Cycle Consistency
Hyperspectral images (HSI) with abundant spectral information reflected
materials property usually perform low spatial resolution due to the hardware
limits. Meanwhile, multispectral images (MSI), e.g., RGB images, have a high
spatial resolution but deficient spectral signatures. Hyperspectral and
multispectral image fusion can be cost-effective and efficient for acquiring
both high spatial resolution and high spectral resolution images. Many of the
conventional HSI and MSI fusion algorithms rely on known spatial degradation
parameters, i.e., point spread function, spectral degradation parameters,
spectral response function, or both of them. Another class of deep
learning-based models relies on the ground truth of high spatial resolution HSI
and needs large amounts of paired training images when working in a supervised
manner. Both of these models are limited in practical fusion scenarios. In this
paper, we propose an unsupervised HSI and MSI fusion model based on the cycle
consistency, called CycFusion. The CycFusion learns the domain transformation
between low spatial resolution HSI (LrHSI) and high spatial resolution MSI
(HrMSI), and the desired high spatial resolution HSI (HrHSI) are considered to
be intermediate feature maps in the transformation networks. The CycFusion can
be trained with the objective functions of marginal matching in single
transform and cycle consistency in double transforms. Moreover, the estimated
PSF and SRF are embedded in the model as the pre-training weights, which
further enhances the practicality of our proposed model. Experiments conducted
on several datasets show that our proposed model outperforms all compared
unsupervised fusion methods. The codes of this paper will be available at this
address: https: //github.com/shuaikaishi/CycFusion for reproducibility
An Overview on the Generation and Detection of Synthetic and Manipulated Satellite Images
Due to the reduction of technological costs and the increase of satellites
launches, satellite images are becoming more popular and easier to obtain.
Besides serving benevolent purposes, satellite data can also be used for
malicious reasons such as misinformation. As a matter of fact, satellite images
can be easily manipulated relying on general image editing tools. Moreover,
with the surge of Deep Neural Networks (DNNs) that can generate realistic
synthetic imagery belonging to various domains, additional threats related to
the diffusion of synthetically generated satellite images are emerging. In this
paper, we review the State of the Art (SOTA) on the generation and manipulation
of satellite images. In particular, we focus on both the generation of
synthetic satellite imagery from scratch, and the semantic manipulation of
satellite images by means of image-transfer technologies, including the
transformation of images obtained from one type of sensor to another one. We
also describe forensic detection techniques that have been researched so far to
classify and detect synthetic image forgeries. While we focus mostly on
forensic techniques explicitly tailored to the detection of AI-generated
synthetic contents, we also review some methods designed for general splicing
detection, which can in principle also be used to spot AI manipulate imagesComment: 25 pages, 17 figures, 5 tables, APSIPA 202
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation