63 research outputs found

    Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization

    Full text link
    Recently, deep learning-based methods have dominated image dehazing domain. Although very competitive dehazing performance has been achieved with sophisticated models, effective solutions for extracting useful features are still under-explored. In addition, non-local network, which has made a breakthrough in many vision tasks, has not been appropriately applied to image dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting of the multi-stream feature attention block (MSFAB) and cross non-local block (CNLB) is presented in this paper. We start with extracting richer features for dehazing. Specifically, we design a multi-stream feature extraction (MSFE) sub-block, which contains three parallel convolutions with different receptive fields (i.e., 1Ă—11\times 1, 3Ă—33\times 3, 5Ă—55\times 5) for extracting multi-scale features. Following MSFE, we employ an attention sub-block to make the model adaptively focus on important channels/regions. The MSFE and attention sub-blocks constitute our MSFAB. Then, we design a cross non-local block (CNLB), which can capture long-range dependencies beyond the query. Instead of the same input source of query branch, the key and value branches are enhanced by fusing more preceding features. CNLB is computation-friendly by leveraging a spatial pyramid down-sampling (SPDS) strategy to reduce the computation and memory consumption without sacrificing the performance. Last but not least, a novel detail-focused contrastive regularization (DFCR) is presented by emphasizing the low-level details and ignoring the high-level semantic information in the representation space. Comprehensive experimental results demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art dehazing methods with less than 1.5 Million parameters.Comment: submitted to IEEE TCYB for possible publicatio

    Prompt-based test-time real image dehazing: a novel pipeline

    Full text link
    Existing methods attempt to improve models' generalization ability on real-world hazy images by exploring well-designed training schemes (e.g., CycleGAN, prior loss). However, most of them need very complicated training procedures to achieve satisfactory results. In this work, we present a totally novel testing pipeline called Prompt-based Test-Time Dehazing (PTTD) to help generate visually pleasing results of real-captured hazy images during the inference phase. We experimentally find that given a dehazing model trained on synthetic data, by fine-tuning the statistics (i.e., mean and standard deviation) of encoding features, PTTD is able to narrow the domain gap, boosting the performance of real image dehazing. Accordingly, we first apply a prompt generation module (PGM) to generate a visual prompt, which is the source of appropriate statistical perturbations for mean and standard deviation. And then, we employ the feature adaptation module (FAM) into the existing dehazing models for adjusting the original statistics with the guidance of the generated prompt. Note that, PTTD is model-agnostic and can be equipped with various state-of-the-art dehazing models trained on synthetic hazy-clean pairs. Extensive experimental results demonstrate that our PTTD is flexible meanwhile achieves superior performance against state-of-the-art dehazing methods in real-world scenarios. The source code of our PTTD will be made available at https://github.com/cecret3350/PTTD-Dehazing.Comment: update github link (https://github.com/cecret3350/PTTD-Dehazing

    Cascaded Deep Networks with Multiple Receptive Fields for Infrared Image Super-Resolution

    Get PDF
    Infrared images have a wide range of military and civilian applications including night vision, surveillance and robotics. However, high-resolution infrared detectors are difficult to fabricate and their manufacturing cost is expensive. In this work, we present a cascaded architecture of deep neural networks with multiple receptive fields to increase the spatial resolution of infrared images by a large scale factor (Ă—8). Instead of reconstructing a high-resolution image from its low-resolution version using a single complex deep network, the key idea of our approach is to set up a mid-point (scale Ă—2) between scale Ă—1 and Ă—8 such that lost information can be divided into two components. Lost information within each component contains similar patterns thus can be more accurately recovered even using a simpler deep network. In our proposed cascaded architecture, two consecutive deep networks with different receptive fields are jointly trained through a multi-scale loss function. The first network with a large receptive field is applied to recover large-scale structure information, while the second one uses a relatively smaller receptive field to reconstruct small-scale image details. Our proposed method is systematically evaluated using realistic infrared images. Compared with state-of-theart Super-Resolution methods, our proposed cascaded approach achieves improved reconstruction accuracy using significantly less parameters

    Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo

    Full text link
    In this paper, we build a two-stage Convolutional Neural Network (CNN) architecture to construct inter- and intra-frame representations based on an arbitrary number of images captured under different light directions, performing accurate normal estimation of non-Lambertian objects. We experimentally investigate numerous network design alternatives for identifying the optimal scheme to deploy inter-frame and intra-frame feature extraction modules for the photometric stereo problem. Moreover, we propose to utilize the easily obtained object mask for eliminating adverse interference from invalid background regions in intra-frame spatial convolutions, thus effectively improve the accuracy of normal estimation for surfaces made of dark materials or with cast shadows. Experimental results demonstrate that proposed masked two-stage photometric stereo CNN model (MT-PS-CNN) performs favorably against state-of-the-art photometric stereo techniques in terms of both accuracy and efficiency. In addition, the proposed method is capable of predicting accurate and rich surface normal details for non-Lambertian objects of complex geometry and performs stably given inputs captured in both sparse and dense lighting distributions.Comment: 9 pages,8 figure

    Real-Time Work Zone Traffic Management via Unmanned Air Vehicles

    Get PDF
    Highway work zones are prone to traffic accidents when congestion and queues develop. Vehicle queues expand at a rate of 1 mile every 2 minutes. Back-of-queue, rear-end crashes are the most common work zone crash, endangering the safety of motorists, passengers, and construction workers. The dynamic nature of queuing in the proximity of highway work zones necessitates traffic management solutions that can monitor and intervene in real time. Fortunately, recent progress in sensor technology, embedded systems, and wireless communication coupled to lower costs are now enabling the development of real-time, automated, “intelligent” traffic management systems that address this problem. The goal of this project was to perform preliminary research and proof of concept development work for the use of UAS in realtime traffic monitoring of highway construction zones in order to create real-time alerts for motorists, construction workers, and first responders. The main tasks of the proposed system was to collect traffic data via the UAV camera, analyze that a UAV based highway construction zone monitoring systems would be capable of detecting congestion and back-of-queue information, and alerting motorists of stopped traffic conditions, delay times, and alternate route options. Experiments were conducted using UAS to monitor traffic and collect traffic videos for processing. Prototype software was created to analyze this data. The software was successful in detecting vehicle speed from zero mph to highway speeds. Review of available mobile traffic apps were conducted for future integration with advanced iterations of the UAV and software system that has been created by this research. This project has proven that UAS monitoring of highway construction zones and real-time alerts to motorists, construction crews, and first responders is possible in the near term and future research is needed to further development and implement the innovative UAS traffic monitoring system developed by this research

    Deep Neural Network for Fast and Accurate Single Image Super-Resolution via Channel-Attention-based Fusion of Orientation-aware Features

    Get PDF
    Recently, Convolutional Neural Networks (CNNs) have been successfully adopted to solve the ill-posed single image super-resolution (SISR) problem. A commonly used strategy to boost the performance of CNN-based SISR models is deploying very deep networks, which inevitably incurs many obvious drawbacks (e.g., a large number of network parameters, heavy computational loads, and difficult model training). In this paper, we aim to build more accurate and faster SISR models via developing better-performing feature extraction and fusion techniques. Firstly, we proposed a novel Orientation-Aware feature extraction and fusion Module (OAM), which contains a mixture of 1D and 2D convolutional kernels (i.e., 5 x 1, 1 x 5, and 3 x 3) for extracting orientation-aware features. Secondly, we adopt the channel attention mechanism as an effective technique to adaptively fuse features extracted in different directions and in hierarchically stacked convolutional stages. Based on these two important improvements, we present a compact but powerful CNN-based model for high-quality SISR via Channel Attention-based fusion of Orientation-Aware features (SISR-CA-OA). Extensive experimental results verify the superiority of the proposed SISR-CA-OA model, performing favorably against the state-of-the-art SISR models in terms of both restoration accuracy and computational efficiency. The source codes will be made publicly available.Comment: 12 pages, 11 figure

    Time Irreversibility from Time Series for Analyzing Oil-in-Water Flow Transition

    Get PDF
    We first experimentally collect conductance fluctuation signals of oil-in-water two-phase flow in a vertical pipe. Then we detect the flow pattern asymmetry character from the collected signals with multidimensional time irreversibility and multiscale time irreversibility index. Moreover, we propose a novel criterion, that is, AMSI (average of multiscale time irreversibility), to quantitatively investigate the oil-in-water two-phase flow pattern dynamics. The results show that AMSI is sensitive to the flow pattern evolution that can be used to predict the flow pattern transition and bubble coalescence

    Reasoning in Different Directions: Triplet Learning for Scene Graph Generation

    No full text
    Scene graph generation aims to detect objects and their relations in images, providing structured representations for scene understanding. Currently, mainstream approaches first detect the objects and then solve a classification task to determine the relation between each object pair, ignoring the other combinations of the subject-predicate-object triplet. In this work we propose a triplet learning paradigm for scene graph generation, where given any two entities of the triplet we learn to predict the third. The multi-task learning scheme is adopted to equip a scene graph generation model with the triplet learning task, in which the prediction heads for the subject, object and predicate share the same backbone and are jointly trained. The proposed method does not require any additional annotation and is easy to embed in existing networks. It benefits scene graph generation models in gaining more generalizability and thus can be applied to both biased and unbiased methods. Moreover, we introduce a new Graph Structure-Aware Transformer (GSAT) model that incorporates the structural information of the scene graph via a modified self-attention mechanism. Extensive experiments show that the proposed triplet learning consistently improves the performance of several state-of-the-art models on the Visual Genome dataset
    • …
    corecore