114 research outputs found
Rich Feature Distillation with Feature Affinity Module for Efficient Image Dehazing
Single-image haze removal is a long-standing hurdle for computer vision
applications. Several works have been focused on transferring advances from
image classification, detection, and segmentation to the niche of image
dehazing, primarily focusing on contrastive learning and knowledge
distillation. However, these approaches prove computationally expensive,
raising concern regarding their applicability to on-the-edge use-cases. This
work introduces a simple, lightweight, and efficient framework for single-image
haze removal, exploiting rich "dark-knowledge" information from a lightweight
pre-trained super-resolution model via the notion of heterogeneous knowledge
distillation. We designed a feature affinity module to maximize the flow of
rich feature semantics from the super-resolution teacher to the student
dehazing network. In order to evaluate the efficacy of our proposed framework,
its performance as a plug-and-play setup to a baseline model is examined. Our
experiments are carried out on the RESIDE-Standard dataset to demonstrate the
robustness of our framework to the synthetic and real-world domains. The
extensive qualitative and quantitative results provided establish the
effectiveness of the framework, achieving gains of upto 15\% (PSNR) while
reducing the model size by 20 times.Comment: Preprint version. Accepted at Opti
DEEP LEARNING FOR IMAGE RESTORATION AND ROBOTIC VISION
Traditional model-based approach requires the formulation of mathematical model, and the model often has limited performance. The quality of an image may degrade due to a variety of reasons: It could be the context of scene is affected by weather conditions such as haze, rain, and snow; It\u27s also possible that there is some noise generated during image processing/transmission (e.g., artifacts generated during compression.). The goal of image restoration is to restore the image back to desirable quality both subjectively and objectively. Agricultural robotics is gaining interest these days since most agricultural works are lengthy and repetitive. Computer vision is crucial to robots especially the autonomous ones. However, it is challenging to have a precise mathematical model to describe the aforementioned problems. Compared with traditional approach, learning-based approach has an edge since it does not require any model to describe the problem. Moreover, learning-based approach now has the best-in-class performance on most of the vision problems such as image dehazing, super-resolution, and image recognition.
In this dissertation, we address the problem of image restoration and robotic vision with deep learning. These two problems are highly related with each other from a unique network architecture perspective: It is essential to select appropriate networks when dealing with different problems. Specifically, we solve the problems of single image dehazing, High Efficiency Video Coding (HEVC) loop filtering and super-resolution, and computer vision for an autonomous robot. Our technical contributions are threefold: First, we propose to reformulate haze as a signal-dependent noise which allows us to uncover it by learning a structural residual. Based on our novel reformulation, we solve dehazing with recursive deep residual network and generative adversarial network which emphasizes on objective and perceptual quality, respectively. Second, we replace traditional filters in HEVC with a Convolutional Neural Network (CNN) filter. We show that our CNN filter could achieve 7% BD-rate saving when compared with traditional filters such as bilateral and deblocking filter. We also propose to incorporate a multi-scale CNN super-resolution module into HEVC. Such post-processing module could improve visual quality under extremely low bandwidth. Third, a transfer learning technique is implemented to support vision and autonomous decision making of a precision pollination robot. Good experimental results are reported with real-world data
Road Detection and Recognition from Monocular Images Using Neural Networks
Teede eristamine on oluline osa iseseisvatest navigatsioonisüsteemidest, mis aitavad robotitel ja autonoomsetel sõidukitel maapinnal liikuda. See on kasutusel erinevates seotud alamülesannetes, näiteks võimalike valiidsete liikumisteede leidmisel, takistusega kokkupõrke vältimisel ja teel asuvate objektide avastamisel.Selle töö eesmärk on uurida eksisteerivaid teede tuvastamise ja eristamise võtteid ning pakkuda välja alternatiivne lahendus selle teostamiseks.Töö jaoks loodi 5300-pildine andmestik ilma lisainfota teepiltidest. Lisaks tehti kokkuvõte juba eksisteerivatest teepiltide andmestikest. Töös pakume erinevates keskkondades asuvate teede piltide klassifitseerimiseks välja LeNet-5’l põhineva tehisnärvivõrgu. Samuti esitleme FCN-8’l põhinevat mudelit pikslipõhiseks pildituvastuseks.Road recognition is one of the important aspects in Autonomous Navigation Systems. These systems help to navigate the autonomous vehicle and robot on the ground. Further, road detection is useful in related sub-tasks such as finding valid road path where the robot/vehicle can go, for supportive driverless vehicles, preventing the collision with the obstacle, object detection on the road, and others.The goal of this thesis is to examine existing road detection and recognition techniques and propose an alternative solution for road classification and detection task.Our contribution consists of several parts. Firstly, we released the road images dataset with approximately 5,300 unlabeled road images. Secondly, we summarized the information about the existing road images datasets. Thirdly, we proposed the convolutional LeNet-5-based neural network for the road image classification for various environments. Finally, our FCN-8-based model for pixel-wise image recognition has been presented
De-smokeGCN: Generative Cooperative Networks for Joint Surgical Smoke Detection and Removal
Surgical smoke removal algorithms can improve the quality of intra-operative imaging and reduce hazards in image-guided surgery, a highly desirable post-process for many clinical applications. These algorithms also enable effective computer vision tasks for future robotic surgery. In this paper, we present a new unsupervised learning framework for high-quality pixel-wise smoke detection and removal. One of the well recognized grand challenges in using convolutional neural networks (CNNs) for medical image processing is to obtain intra-operative medical imaging datasets for network training and validation, but availability and quality of these datasets are scarce. Our novel training framework does not require ground-truth image pairs. Instead, it learns purely from computer-generated simulation images. This approach opens up new avenues and bridges a substantial gap between conventional non-learning based methods and which requiring prior knowledge gained from extensive training datasets. Inspired by the Generative Adversarial Network (GAN), we have developed a novel generative-collaborative learning scheme that decomposes the de-smoke process into two separate tasks: smoke detection and smoke removal. The detection network is used as prior knowledge, and also as a loss function to maximize its support for training of the smoke removal network. Quantitative and qualitative studies show that the proposed training framework outperforms the state-of-the-art de-smoking approaches including the latest GAN framework (such as PIX2PIX). Although trained on synthetic images, experimental results on clinical images have proved the effectiveness of the proposed network for detecting and removing surgical smoke on both simulated and real-world laparoscopic images
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing
In recent years, Transformer networks are beginning to replace pure
convolutional neural networks (CNNs) in the field of computer vision due to
their global receptive field and adaptability to input. However, the quadratic
computational complexity of softmax-attention limits the wide application in
image dehazing task, especially for high-resolution images. To address this
issue, we propose a new Transformer variant, which applies the Taylor expansion
to approximate the softmax-attention and achieves linear computational
complexity. A multi-scale attention refinement module is proposed as a
complement to correct the error of the Taylor expansion. Furthermore, we
introduce a multi-branch architecture with multi-scale patch embedding to the
proposed Transformer, which embeds features by overlapping deformable
convolution of different scales. The design of multi-scale patch embedding is
based on three key ideas: 1) various sizes of the receptive field; 2)
multi-level semantic information; 3) flexible shapes of the receptive field.
Our model, named Multi-branch Transformer expanded by Taylor formula
(MB-TaylorFormer), can embed coarse to fine features more flexibly at the patch
embedding stage and capture long-distance pixel interactions with limited
computational cost. Experimental results on several dehazing benchmarks show
that MB-TaylorFormer achieves state-of-the-art (SOTA) performance with a light
computational burden. The source code and pre-trained models are available at
https://github.com/FVL2020/ICCV-2023-MB-TaylorFormer.Comment: ICCV 202
- …