63 research outputs found
Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization
Recently, deep learning-based methods have dominated image dehazing domain.
Although very competitive dehazing performance has been achieved with
sophisticated models, effective solutions for extracting useful features are
still under-explored. In addition, non-local network, which has made a
breakthrough in many vision tasks, has not been appropriately applied to image
dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting
of the multi-stream feature attention block (MSFAB) and cross non-local block
(CNLB) is presented in this paper. We start with extracting richer features for
dehazing. Specifically, we design a multi-stream feature extraction (MSFE)
sub-block, which contains three parallel convolutions with different receptive
fields (i.e., , , ) for extracting multi-scale
features. Following MSFE, we employ an attention sub-block to make the model
adaptively focus on important channels/regions. The MSFE and attention
sub-blocks constitute our MSFAB. Then, we design a cross non-local block
(CNLB), which can capture long-range dependencies beyond the query. Instead of
the same input source of query branch, the key and value branches are enhanced
by fusing more preceding features. CNLB is computation-friendly by leveraging a
spatial pyramid down-sampling (SPDS) strategy to reduce the computation and
memory consumption without sacrificing the performance. Last but not least, a
novel detail-focused contrastive regularization (DFCR) is presented by
emphasizing the low-level details and ignoring the high-level semantic
information in the representation space. Comprehensive experimental results
demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art
dehazing methods with less than 1.5 Million parameters.Comment: submitted to IEEE TCYB for possible publicatio
Prompt-based test-time real image dehazing: a novel pipeline
Existing methods attempt to improve models' generalization ability on
real-world hazy images by exploring well-designed training schemes (e.g.,
CycleGAN, prior loss). However, most of them need very complicated training
procedures to achieve satisfactory results. In this work, we present a totally
novel testing pipeline called Prompt-based Test-Time Dehazing (PTTD) to help
generate visually pleasing results of real-captured hazy images during the
inference phase. We experimentally find that given a dehazing model trained on
synthetic data, by fine-tuning the statistics (i.e., mean and standard
deviation) of encoding features, PTTD is able to narrow the domain gap,
boosting the performance of real image dehazing. Accordingly, we first apply a
prompt generation module (PGM) to generate a visual prompt, which is the source
of appropriate statistical perturbations for mean and standard deviation. And
then, we employ the feature adaptation module (FAM) into the existing dehazing
models for adjusting the original statistics with the guidance of the generated
prompt. Note that, PTTD is model-agnostic and can be equipped with various
state-of-the-art dehazing models trained on synthetic hazy-clean pairs.
Extensive experimental results demonstrate that our PTTD is flexible meanwhile
achieves superior performance against state-of-the-art dehazing methods in
real-world scenarios. The source code of our PTTD will be made available at
https://github.com/cecret3350/PTTD-Dehazing.Comment: update github link (https://github.com/cecret3350/PTTD-Dehazing
Cascaded Deep Networks with Multiple Receptive Fields for Infrared Image Super-Resolution
Infrared images have a wide range of military and civilian applications including night vision, surveillance and robotics. However, high-resolution infrared detectors are difficult to fabricate and their manufacturing cost is expensive. In this work, we present a cascaded architecture of deep neural networks with multiple receptive fields to increase the spatial resolution of infrared images by a large scale factor (Ă—8). Instead of reconstructing a high-resolution image from its low-resolution version using a single complex deep network, the key idea of our approach is to set up a mid-point (scale Ă—2) between scale Ă—1 and Ă—8 such that lost information can be divided into two components. Lost information within each component contains similar patterns thus can be more accurately recovered even using a simpler deep network. In our proposed cascaded architecture, two consecutive deep networks with different receptive fields are jointly trained through a multi-scale loss function. The first network with a large receptive field is applied to recover large-scale structure information, while the second one uses a relatively smaller receptive field to reconstruct small-scale image details. Our proposed method is systematically evaluated using realistic infrared images. Compared with state-of-theart Super-Resolution methods, our proposed cascaded approach achieves improved reconstruction accuracy using significantly less parameters
Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo
In this paper, we build a two-stage Convolutional Neural Network (CNN)
architecture to construct inter- and intra-frame representations based on an
arbitrary number of images captured under different light directions,
performing accurate normal estimation of non-Lambertian objects. We
experimentally investigate numerous network design alternatives for identifying
the optimal scheme to deploy inter-frame and intra-frame feature extraction
modules for the photometric stereo problem. Moreover, we propose to utilize the
easily obtained object mask for eliminating adverse interference from invalid
background regions in intra-frame spatial convolutions, thus effectively
improve the accuracy of normal estimation for surfaces made of dark materials
or with cast shadows. Experimental results demonstrate that proposed masked
two-stage photometric stereo CNN model (MT-PS-CNN) performs favorably against
state-of-the-art photometric stereo techniques in terms of both accuracy and
efficiency. In addition, the proposed method is capable of predicting accurate
and rich surface normal details for non-Lambertian objects of complex geometry
and performs stably given inputs captured in both sparse and dense lighting
distributions.Comment: 9 pages,8 figure
Real-Time Work Zone Traffic Management via Unmanned Air Vehicles
Highway work zones are prone to traffic accidents when congestion and queues develop. Vehicle queues expand at a rate of 1 mile every 2 minutes. Back-of-queue, rear-end crashes are the most common work zone crash, endangering the safety of motorists, passengers, and construction workers. The dynamic nature of queuing in the proximity of highway work zones necessitates traffic management solutions that can monitor and intervene in real time. Fortunately, recent progress in sensor technology, embedded systems, and wireless communication coupled to lower costs are now enabling the development of real-time, automated, “intelligent” traffic management systems that address this problem. The goal of this project was to perform preliminary research and proof of concept development work for the use of UAS in realtime traffic monitoring of highway construction zones in order to create real-time alerts for motorists, construction workers, and first responders. The main tasks of the proposed system was to collect traffic data via the UAV camera, analyze that a UAV based highway construction zone monitoring systems would be capable of detecting congestion and back-of-queue information, and alerting motorists of stopped traffic conditions, delay times, and alternate route options. Experiments were conducted using UAS to monitor traffic and collect traffic videos for processing. Prototype software was created to analyze this data. The software was successful in detecting vehicle speed from zero mph to highway speeds. Review of available mobile traffic apps were conducted for future integration with advanced iterations of the UAV and software system that has been created by this research. This project has proven that UAS monitoring of highway construction zones and real-time alerts to motorists, construction crews, and first responders is possible in the near term and future research is needed to further development and implement the innovative UAS traffic monitoring system developed by this research
Deep Neural Network for Fast and Accurate Single Image Super-Resolution via Channel-Attention-based Fusion of Orientation-aware Features
Recently, Convolutional Neural Networks (CNNs) have been successfully adopted
to solve the ill-posed single image super-resolution (SISR) problem. A commonly
used strategy to boost the performance of CNN-based SISR models is deploying
very deep networks, which inevitably incurs many obvious drawbacks (e.g., a
large number of network parameters, heavy computational loads, and difficult
model training). In this paper, we aim to build more accurate and faster SISR
models via developing better-performing feature extraction and fusion
techniques. Firstly, we proposed a novel Orientation-Aware feature extraction
and fusion Module (OAM), which contains a mixture of 1D and 2D convolutional
kernels (i.e., 5 x 1, 1 x 5, and 3 x 3) for extracting orientation-aware
features. Secondly, we adopt the channel attention mechanism as an effective
technique to adaptively fuse features extracted in different directions and in
hierarchically stacked convolutional stages. Based on these two important
improvements, we present a compact but powerful CNN-based model for
high-quality SISR via Channel Attention-based fusion of Orientation-Aware
features (SISR-CA-OA). Extensive experimental results verify the superiority of
the proposed SISR-CA-OA model, performing favorably against the
state-of-the-art SISR models in terms of both restoration accuracy and
computational efficiency. The source codes will be made publicly available.Comment: 12 pages, 11 figure
Time Irreversibility from Time Series for Analyzing Oil-in-Water Flow Transition
We first experimentally collect conductance fluctuation signals of oil-in-water two-phase flow in a vertical pipe. Then we detect the flow pattern asymmetry character from the collected signals with multidimensional time irreversibility and multiscale time irreversibility index. Moreover, we propose a novel criterion, that is, AMSI (average of multiscale time irreversibility), to quantitatively investigate the oil-in-water two-phase flow pattern dynamics. The results show that AMSI is sensitive to the flow pattern evolution that can be used to predict the flow pattern transition and bubble coalescence
Reasoning in Different Directions: Triplet Learning for Scene Graph Generation
Scene graph generation aims to detect objects and their relations in images, providing structured representations for scene understanding. Currently, mainstream approaches first detect the objects and then solve a classification task to determine the relation between each object pair, ignoring the other combinations of the subject-predicate-object triplet. In this work we propose a triplet learning paradigm for scene graph generation, where given any two entities of the triplet we learn to predict the third. The multi-task learning scheme is adopted to equip a scene graph generation model with the triplet learning task, in which the prediction heads for the subject, object and predicate share the same backbone and are jointly trained. The proposed method does not require any additional annotation and is easy to embed in existing networks. It benefits scene graph generation models in gaining more generalizability and thus can be applied to both biased and unbiased methods. Moreover, we introduce a new Graph Structure-Aware Transformer (GSAT) model that incorporates the structural information of the scene graph via a modified self-attention mechanism. Extensive experiments show that the proposed triplet learning consistently improves the performance of several state-of-the-art models on the Visual Genome dataset
- …