111 research outputs found
Learning to Dehaze from Realistic Scene with A Fast Physics-based Dehazing Network
Dehazing is a popular computer vision topic for long. A real-time dehazing
method with reliable performance is highly desired for many applications such
as autonomous driving. While recent learning-based methods require datasets
containing pairs of hazy images and clean ground truth references, it is
generally impossible to capture accurate ground truth in real scenes. Many
existing works compromise this difficulty to generate hazy images by rendering
the haze from depth on common RGBD datasets using the haze imaging model.
However, there is still a gap between the synthetic datasets and real hazy
images as large datasets with high-quality depth are mostly indoor and depth
maps for outdoor are imprecise. In this paper, we complement the existing
datasets with a new, large, and diverse dehazing dataset containing real
outdoor scenes from High-Definition (HD) 3D movies. We select a large number of
high-quality frames of real outdoor scenes and render haze on them using depth
from stereo. Our dataset is more realistic than existing ones and we
demonstrate that using this dataset greatly improves the dehazing performance
on real scenes. In addition to the dataset, we also propose a light and
reliable dehazing network inspired by the physics model. Our approach
outperforms other methods by a large margin and becomes the new
state-of-the-art method. Moreover, the light-weight design of the network
enables our method to run at a real-time speed, which is much faster than other
baseline methods
Hierarchy Flow For High-Fidelity Image-to-Image Translation
Image-to-image (I2I) translation comprises a wide spectrum of tasks. Here we
divide this problem into three levels: strong-fidelity translation,
normal-fidelity translation, and weak-fidelity translation, indicating the
extent to which the content of the original image is preserved. Although
existing methods achieve good performance in weak-fidelity translation, they
fail to fully preserve the content in both strong- and normal-fidelity tasks,
e.g. sim2real, style transfer and low-level vision. In this work, we propose
Hierarchy Flow, a novel flow-based model to achieve better content preservation
during translation. Specifically, 1) we first unveil the drawbacks of standard
flow-based models when applied to I2I translation. 2) Next, we propose a new
design, namely hierarchical coupling for reversible feature transformation and
multi-scale modeling, to constitute Hierarchy Flow. 3) Finally, we present a
dedicated aligned-style loss for a better trade-off between content
preservation and stylization during translation. Extensive experiments on a
wide range of I2I translation benchmarks demonstrate that our approach achieves
state-of-the-art performance, with convincing advantages in both strong- and
normal-fidelity tasks. Code and models will be at
https://github.com/WeichenFan/HierarchyFlow.Comment: arXiv admin note: text overlap with arXiv:2207.0190
Adaptive Deep Learning Detection Model for Multi-Foggy Images
The fog has different features and effects within every single environment. Detection whether there is fog in the image is considered a challenge and giving the type of fog has a substantial enlightening effect on image defogging. Foggy scenes have different types such as scenes based on fog density level and scenes based on fog type. Machine learning techniques have a significant contribution to the detection of foggy scenes. However, most of the existing detection models are based on traditional machine learning models, and only a few studies have adopted deep learning models. Furthermore, most of the existing machines learning detection models are based on fog density-level scenes. However, to the best of our knowledge, there is no such detection model based on multi-fog type scenes have presented yet. Therefore, the main goal of our study is to propose an adaptive deep learning model for the detection of multi-fog types of images. Moreover, due to the lack of a publicly available dataset for inhomogeneous, homogenous, dark, and sky foggy scenes, a dataset for multi-fog scenes is presented in this study (https://github.com/Karrar-H-Abdulkareem/Multi-Fog-Dataset). Experiments were conducted in three stages. First, the data collection phase is based on eight resources to obtain the multi-fog scene dataset. Second, a classification experiment is conducted based on the ResNet-50 deep learning model to obtain detection results. Third, evaluation phase where the performance of the ResNet-50 detection model has been compared against three different models. Experimental results show that the proposed model has presented a stable classification performance for different foggy images with a 96% score for each of Classification Accuracy Rate (CAR), Recall, Precision, F1-Score which has specific theoretical and practical significance. Our proposed model is suitable as a pre-processing step and might be considered in different real-time applications
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing
Multi-stage architectures have exhibited efficacy in image dehazing, which
usually decomposes a challenging task into multiple more tractable sub-tasks
and progressively estimates latent hazy-free images. Despite the remarkable
progress, existing methods still suffer from the following shortcomings: (1)
limited exploration of frequency domain information; (2) insufficient
information interaction; (3) severe feature redundancy. To remedy these issues,
we propose a novel Mutual Information-driven Triple interaction Network
(MITNet) based on spatial-frequency dual domain information and two-stage
architecture. To be specific, the first stage, named amplitude-guided haze
removal, aims to recover the amplitude spectrum of the hazy images for haze
removal. And the second stage, named phase-guided structure refined, devotes to
learning the transformation and refinement of the phase spectrum. To facilitate
the information exchange between two stages, an Adaptive Triple Interaction
Module (ATIM) is developed to simultaneously aggregate cross-domain,
cross-scale, and cross-stage features, where the fused features are further
used to generate content-adaptive dynamic filters so that applying them to
enhance global context representation. In addition, we impose the mutual
information minimization constraint on paired scale encoder and decoder
features from both stages. Such an operation can effectively reduce information
redundancy and enhance cross-stage feature complementarity. Extensive
experiments on multiple public datasets exhibit that our MITNet performs
superior performance with lower model complexity.The code and models are
available at https://github.com/it-hao/MITNet.Comment: Accepted in ACM MM 202
- …