29 research outputs found

    DEEP LEARNING FOR IMAGE RESTORATION AND ROBOTIC VISION

    Get PDF
    Traditional model-based approach requires the formulation of mathematical model, and the model often has limited performance. The quality of an image may degrade due to a variety of reasons: It could be the context of scene is affected by weather conditions such as haze, rain, and snow; It\u27s also possible that there is some noise generated during image processing/transmission (e.g., artifacts generated during compression.). The goal of image restoration is to restore the image back to desirable quality both subjectively and objectively. Agricultural robotics is gaining interest these days since most agricultural works are lengthy and repetitive. Computer vision is crucial to robots especially the autonomous ones. However, it is challenging to have a precise mathematical model to describe the aforementioned problems. Compared with traditional approach, learning-based approach has an edge since it does not require any model to describe the problem. Moreover, learning-based approach now has the best-in-class performance on most of the vision problems such as image dehazing, super-resolution, and image recognition. In this dissertation, we address the problem of image restoration and robotic vision with deep learning. These two problems are highly related with each other from a unique network architecture perspective: It is essential to select appropriate networks when dealing with different problems. Specifically, we solve the problems of single image dehazing, High Efficiency Video Coding (HEVC) loop filtering and super-resolution, and computer vision for an autonomous robot. Our technical contributions are threefold: First, we propose to reformulate haze as a signal-dependent noise which allows us to uncover it by learning a structural residual. Based on our novel reformulation, we solve dehazing with recursive deep residual network and generative adversarial network which emphasizes on objective and perceptual quality, respectively. Second, we replace traditional filters in HEVC with a Convolutional Neural Network (CNN) filter. We show that our CNN filter could achieve 7% BD-rate saving when compared with traditional filters such as bilateral and deblocking filter. We also propose to incorporate a multi-scale CNN super-resolution module into HEVC. Such post-processing module could improve visual quality under extremely low bandwidth. Third, a transfer learning technique is implemented to support vision and autonomous decision making of a precision pollination robot. Good experimental results are reported with real-world data

    Image processing and synthesis: From hand-crafted to data-driven modeling

    Get PDF
    This work investigates image and video restoration problems using effective optimization algorithms. First, we study the problem of single image dehazing to suppress artifacts in compressed or noisy images and videos. Our method is based on the linear haze model and minimizes the gradient residual between the input and output images. This successfully suppresses any new artifacts that are not obvious in the input images. Second, we propose a new method for image inpainting using deep neural networks. Given a set of training data, deep generate models can generate high-quality natural images following the same distribution. We search the nearest neighbor in the latent space of the deep generate models using a weighted context loss and prior loss. This code is then converted to the clean and uncorrupted image of the input. Third, we study the problem of recovering high-quality images from very noisy raw data captured in low-light conditions with short exposures. We build deep neural networks to learn the camera processing pipeline specifically for low-light raw data with an extremely low signal-to-noise ratio (SNR). To train the networks, we capture a new dataset of more than five thousand images with short-exposed and long-exposed pairs. Promising results are obtained compared with the traditional image processing pipeline. Finally, we propose a new method for extreme-low light video processing. The raw video frames are pre-processed using spatial-temporal denoising. A neural network is trained to move the error in the pre-processed data, learning to perform the image processing pipeline and encourage temporal smoothness of the output. Both quantitative and qualitative results demonstrate the proposed method significantly outperform the existing methods. It also paves the way for future research on this area

    Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing

    Full text link
    Multi-stage architectures have exhibited efficacy in image dehazing, which usually decomposes a challenging task into multiple more tractable sub-tasks and progressively estimates latent hazy-free images. Despite the remarkable progress, existing methods still suffer from the following shortcomings: (1) limited exploration of frequency domain information; (2) insufficient information interaction; (3) severe feature redundancy. To remedy these issues, we propose a novel Mutual Information-driven Triple interaction Network (MITNet) based on spatial-frequency dual domain information and two-stage architecture. To be specific, the first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal. And the second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum. To facilitate the information exchange between two stages, an Adaptive Triple Interaction Module (ATIM) is developed to simultaneously aggregate cross-domain, cross-scale, and cross-stage features, where the fused features are further used to generate content-adaptive dynamic filters so that applying them to enhance global context representation. In addition, we impose the mutual information minimization constraint on paired scale encoder and decoder features from both stages. Such an operation can effectively reduce information redundancy and enhance cross-stage feature complementarity. Extensive experiments on multiple public datasets exhibit that our MITNet performs superior performance with lower model complexity.The code and models are available at https://github.com/it-hao/MITNet.Comment: Accepted in ACM MM 202

    Learning to Dehaze from Realistic Scene with A Fast Physics-based Dehazing Network

    Full text link
    Dehazing is a popular computer vision topic for long. A real-time dehazing method with reliable performance is highly desired for many applications such as autonomous driving. While recent learning-based methods require datasets containing pairs of hazy images and clean ground truth references, it is generally impossible to capture accurate ground truth in real scenes. Many existing works compromise this difficulty to generate hazy images by rendering the haze from depth on common RGBD datasets using the haze imaging model. However, there is still a gap between the synthetic datasets and real hazy images as large datasets with high-quality depth are mostly indoor and depth maps for outdoor are imprecise. In this paper, we complement the existing datasets with a new, large, and diverse dehazing dataset containing real outdoor scenes from High-Definition (HD) 3D movies. We select a large number of high-quality frames of real outdoor scenes and render haze on them using depth from stereo. Our dataset is more realistic than existing ones and we demonstrate that using this dataset greatly improves the dehazing performance on real scenes. In addition to the dataset, we also propose a light and reliable dehazing network inspired by the physics model. Our approach outperforms other methods by a large margin and becomes the new state-of-the-art method. Moreover, the light-weight design of the network enables our method to run at a real-time speed, which is much faster than other baseline methods
    corecore