378 research outputs found

    An underwater image enhancement by reducing speckle noise using modified anisotropic diffusion filter

    Get PDF
    Underwater images are usually suffering from the issues of quality degradation, such as low contrast due to blurring details, color deviations, non-uniform lighting, and noise. Since last few decades, many researches are undergoing for restoration and enhancement for degraded underwater images. In this paper, we proposed a novel algorithm using modified anisotropic diffusion filter with dynamic color balancing strategy. This proposed algorithm performs based on an employing effective noise reduction as well as edge preserving technique with dynamic color correction to make uniform lighting and minimize the speckle noise. Furthermore, reanalyze the contributions and limitations of existing underwater image restoration and enhancement methods. Finally, in this research provided the detailed objective evaluations and compared with the various underwater scenarios for above said challenges also made subjective studies, which shows that our proposed method will improve the quality of the image and significantly enhanced the image

    Computer vision applied to underwater robotics

    Get PDF

    Road Detection and Recognition from Monocular Images Using Neural Networks

    Get PDF
    Teede eristamine on oluline osa iseseisvatest navigatsioonisüsteemidest, mis aitavad robotitel ja autonoomsetel sõidukitel maapinnal liikuda. See on kasutusel erinevates seotud alamülesannetes, näiteks võimalike valiidsete liikumisteede leidmisel, takistusega kokkupõrke vältimisel ja teel asuvate objektide avastamisel.Selle töö eesmärk on uurida eksisteerivaid teede tuvastamise ja eristamise võtteid ning pakkuda välja alternatiivne lahendus selle teostamiseks.Töö jaoks loodi 5300-pildine andmestik ilma lisainfota teepiltidest. Lisaks tehti kokkuvõte juba eksisteerivatest teepiltide andmestikest. Töös pakume erinevates keskkondades asuvate teede piltide klassifitseerimiseks välja LeNet-5’l põhineva tehisnärvivõrgu. Samuti esitleme FCN-8’l põhinevat mudelit pikslipõhiseks pildituvastuseks.Road recognition is one of the important aspects in Autonomous Navigation Systems. These systems help to navigate the autonomous vehicle and robot on the ground. Further, road detection is useful in related sub-tasks such as finding valid road path where the robot/vehicle can go, for supportive driverless vehicles, preventing the collision with the obstacle, object detection on the road, and others.The goal of this thesis is to examine existing road detection and recognition techniques and propose an alternative solution for road classification and detection task.Our contribution consists of several parts. Firstly, we released the road images dataset with approximately 5,300 unlabeled road images. Secondly, we summarized the information about the existing road images datasets. Thirdly, we proposed the convolutional LeNet-5-based neural network for the road image classification for various environments. Finally, our FCN-8-based model for pixel-wise image recognition has been presented

    초점 스택에서 3D 깊이 재구성 및 깊이 개선

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2021. 2. 신영길.Three-dimensional (3D) depth recovery from two-dimensional images is a fundamental and challenging objective in computer vision, and is one of the most important prerequisites for many applications such as 3D measurement, robot location and navigation, self-driving, and so on. Depth-from-focus (DFF) is one of the important methods to reconstruct a 3D depth in the use of focus information. Reconstructing a 3D depth from texture-less regions is a typical issue associated with the conventional DFF. Further more, it is difficult for the conventional DFF reconstruction techniques to preserve depth edges and fine details while maintaining spatial consistency. In this dissertation, we address these problems and propose an DFF depth recovery framework which is robust over texture-less regions, and can reconstruct a depth image with clear edges and fine details. The depth recovery framework proposed in this dissertation is composed of two processes: depth reconstruction and depth refinement. To recovery an accurate 3D depth, We first formulate the depth reconstruction as a maximum a posterior (MAP) estimation problem with the inclusion of matting Laplacian prior. The nonlocal principle is adopted during the construction stage of the matting Laplacian matrix to preserve depth edges and fine details. Additionally, a depth variance based confidence measure with the combination of the reliability measure of focus measure is proposed to maintain the spatial smoothness, such that the smooth depth regions in initial depth could have high confidence value and the reconstructed depth could be more derived from the initial depth. As the nonlocal principle breaks the spatial consistency, the reconstructed depth image is spatially inconsistent. Meanwhile, it suffers from texture-copy artifacts. To smooth the noise and suppress the texture-copy artifacts introduced in the reconstructed depth image, we propose a closed-form edge-preserving depth refinement algorithm that formulates the depth refinement as a MAP estimation problem using Markov random fields (MRFs). With the incorporation of pre-estimated depth edges and mutual structure information into our energy function and the specially designed smoothness weight, the proposed refinement method can effectively suppress noise and texture-copy artifacts while preserving depth edges. Additionally, with the construction of undirected weighted graph representing the energy function, a closed-form solution is obtained by using the Laplacian matrix corresponding to the graph. The proposed framework presents a novel method of 3D depth recovery from a focal stack. The proposed algorithm shows the superiority in depth recovery over texture-less regions owing to the effective variance based confidence level computation and the matting Laplacian prior. Additionally, this proposed reconstruction method can obtain a depth image with clear edges and fine details due to the adoption of nonlocal principle in the construct]ion of matting Laplacian matrix. The proposed closed-form depth refinement approach shows that the ability in noise removal while preserving object structure with the usage of common edges. Additionally, it is able to effectively suppress texture-copy artifacts by utilizing mutual structure information. The proposed depth refinement provides a general idea for edge-preserving image smoothing, especially for depth related refinement such as stereo vision. Both quantitative and qualitative experimental results show the supremacy of the proposed method in terms of robustness in texture-less regions, accuracy, and ability to preserve object structure while maintaining spatial smoothness.Chapter 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 2 Related Works 9 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Principle of depth-from-focus . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Focus measure operators . . . . . . . . . . . . . . . . . . . 12 2.3 Depth-from-focus reconstruction . . . . . . . . . . . . . . . . . . 14 2.4 Edge-preserving image denoising . . . . . . . . . . . . . . . . . . 23 Chapter 3 Depth-from-Focus Reconstruction using Nonlocal Matting Laplacian Prior 38 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Image matting and matting Laplacian . . . . . . . . . . . . . . . 40 3.3 Depth-from-focus . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 Depth reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . 47 3.4.2 Likelihood model . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.3 Nonlocal matting Laplacian prior model . . . . . . . . . . 50 3.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.2 Data configuration . . . . . . . . . . . . . . . . . . . . . . 55 3.5.3 Reconstruction results . . . . . . . . . . . . . . . . . . . . 56 3.5.4 Comparison between reconstruction using local and nonlocal matting Laplacian . . . . . . . . . . . . . . . . . . . 56 3.5.5 Spatial consistency analysis . . . . . . . . . . . . . . . . . 59 3.5.6 Parameter setting and analysis . . . . . . . . . . . . . . . 59 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 4 Closed-form MRF-based Depth Refinement 63 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3 Closed-form solution . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Edge preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5 Texture-copy artifacts suppression . . . . . . . . . . . . . . . . . 73 4.6 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Chapter 5 Evaluation 82 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Evaluation on synthetic datasets . . . . . . . . . . . . . . . . . . 84 5.4 Evaluation on real scene datasets . . . . . . . . . . . . . . . . . . 89 5.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.6 Computational performances . . . . . . . . . . . . . . . . . . . . 93 Chapter 6 Conclusion 96 Bibliography 99Docto

    Switching GAN-based Image Filters to Improve Perception for Autonomous Driving

    Get PDF
    Autonomous driving holds the potential to increase human productivity, reduce accidents caused by human errors, allow better utilization of roads, reduce traffic accidents and congestion, free up parking space and provide many other advantages. Perception of Autonomous Vehicles (AV) refers to the use of sensors to perceive the world, e.g. using cameras to detect and classify objects. Traffic scene understanding is a key research problem in perception in autonomous driving, and semantic segmentation is a useful method to address this problem. Adverse weather conditions are a reality that AV must contend with. Conditions like rain, snow, haze, etc. can drastically reduce visibility and thus affect computer vision models. Models for perception for AVs are currently designed for and tested on predominantly ideal weather conditions under good illumination. The most complete solution may be to have the segmentation networks be trained on all possible adverse conditions. Thus a dataset to train a segmentation network to make it robust to rain would need to have adequate data that cover these conditions well. Moreover, labeling is an expensive task. It is particularly expensive for semantic segmentation, as each object in a scene needs to be identified and each pixel annotated in the right class. Thus, the adverse weather is a challenging problem for perception models in AVs. This thesis explores the use of Generative Adversarial Networks (GAN) in order to improve semantic segmentation. We design a framework and a methodology to evaluate the proposed approach. The framework consists of an Adversity Detector, and a series of denoising filters. The Adversity Detector is an image classifier that takes as input clear weather or adverse weather scenes, and attempts to predict whether the given image contains rain, or puddles, or other conditions that can adversely affect semantic segmentation. The filters are denoising generative adversarial networks that are trained to remove the adverse conditions from images in order to translate the image to a domain the segmentation network has been trained on, i.e. clear weather images. We use the prediction from the Adversity Detector to choose which GAN filter to use. The methodology we devise for evaluating our approach uses the trained filters to output sets of images that we can then run segmentation tasks on. This, we argue, is a better metric for evaluating the GANs than similarity measures such as SSIM. We also use synthetic data so we can perform systematic evaluation of our technique. We train two kinds of GANs, one that uses paired data (CycleGAN), and one that does not (Pix2Pix). We have concluded that GAN architectures that use unpaired data are not sufficiently good models for denoising. We train the denoising filters using the other architecture and we found them easy to train, and they show good results. While these filters do not show better performance than when we train our segmentation network with adverse weather data, we refer back to the point that training the segmentation network requires labelled data which is expensive to collect and annotate, particularly for adverse weather and lighting conditions. We implement our proposed framework and report a 17\% increase in performance in segmentation over the baseline results obtained when we do not use our framework

    BoundaryCAM: A Boundary-based Refinement Framework for Weakly Supervised Semantic Segmentation of Medical Images

    Full text link
    Weakly Supervised Semantic Segmentation (WSSS) with only image-level supervision is a promising approach to deal with the need for Segmentation networks, especially for generating a large number of pixel-wise masks in a given dataset. However, most state-of-the-art image-level WSSS techniques lack an understanding of the geometric features embedded in the images since the network cannot derive any object boundary information from just image-level labels. We define a boundary here as the line separating an object and its background, or two different objects. To address this drawback, we propose our novel BoundaryCAM framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques in order to achieve fine-grained higher-accuracy segmentation masks. To achieve this, we investigate a state-of-the-art unsupervised semantic segmentation network that can be used to construct a boundary map, which enables BoundaryCAM to predict object locations with sharper boundaries. By applying our method to WSSS predictions, we were able to achieve up to 10% improvements even to the benefit of the current state-of-the-art WSSS methods for medical imaging. The framework is open-source and accessible online at https://github.com/bharathprabakaran/BoundaryCAM

    A Robust Object Detection System for Driverless Vehicles through Sensor Fusion and Artificial Intelligence Techniques

    Get PDF
    Since the early 1990s, various research domains have been concerned with the concept of autonomous driving, leading to the widespread implementation of numerous advanced driver assistance features. However, fully automated vehicles have not yet been introduced to the market. The process of autonomous driving can be outlined through the following stages: environment perception, ego-vehicle localization, trajectory estimation, path planning, and vehicle control. Environment perception is partially based on computer vision algorithms that can detect and track surrounding objects. The process of objects detection performed by autonomous vehicles is considered challenging for several reasons, such as the presence of multiple dynamic objects in the same scene, interaction between objects, real-time speed requirements, and the presence of diverse weather conditions (e.g., rain, snow, fog, etc.). Although many studies have been conducted on objects detection performed by autonomous vehicles, it remains a challenging task, and improving the performance of object detection in diverse driving scenes is an ongoing field. This thesis aims to develop novel methods for the detection and 3D localization of surrounding dynamic objects in driving scenes in different rainy weather conditions. In this thesis, firstly, owing to the frequent occurrence of rain and its negative effect on the performance of objects detection operation, a real-time lightweight deraining network is proposed; it works on single real-time images separately. Rain streaks and the accumulation of rain streaks introduce distinct visual degradation effects to captured images. The proposed deraining network effectively removes both rain streaks and accumulated rain streaks from images. It makes use of the progressive operation of two main stages: rain streaks removal and rain streaks accumulation removal. The rain streaks removal stage is based on a Residual Network (ResNet) to maintain real-time performance and avoid adding to the computational complexity. Furthermore, the application of recursive computations involves the sharing of network parameters. Meanwhile, distant rain streaks accumulate and induce a distortion similar to fogging. Thus, it could be mitigated in a way similar to defogging. This stage relies on a transmission-guided lightweight network (TGL-Net). The proposed deraining network was evaluated on five datasets having synthetic rain of different properties and two other datasets with real rainy scenes. Secondly, an emphasis has been put on proposing a novel sensory system that achieves realtime multiple dynamic objects detection in driving scenes. The proposed sensory system utilizes a monocular camera and a 2D Light Detection and Ranging (LiDAR) sensor in a complementary fusion approach. YOLOv3- a baseline real-time object detection algorithm has been used to detect and classify objects in images captured by the camera; detected objects are surrounded by bounding boxes to localize them within the frames. Since objects present in a driving scene are dynamic and usually occluding each other, an algorithm has been developed to differentiate objects whose bounding boxes are overlapping. Moreover, the locations of bounding boxes within frames (in pixels) are converted into real-world angular coordinates. A 2D LiDAR was used to obtain depth measurements while maintaining low computational requirements in order to save resources for other autonomous driving related operations. A novel technique has been developed and tested for processing and mapping 2D LiDAR measurements with corresponding bounding boxes. The detection accuracy of the proposed system was manually evaluated in different real-time scenarios. Finally, the effectiveness of the proposed deraining network was validated in terms of its impact on objects detection in the context of de-rained images. Results of the proposed deraining network were compared to existing baseline deraining networks and have shown that the running time of the proposed network is 2.23× faster than the average running time of baseline deraining networks while achieving 1.2× improvement when tested on different synthetic datasets. Moreover, tests on the LiDAR measurements showed an average error of ±0.04m in real driving scenes. Also, both deraining and objects detection are jointly tested, and it was demonstrated that performing deraining ahead of objects detection caused 1.45× enhancement in the object detection precision

    Image processing and synthesis: From hand-crafted to data-driven modeling

    Get PDF
    This work investigates image and video restoration problems using effective optimization algorithms. First, we study the problem of single image dehazing to suppress artifacts in compressed or noisy images and videos. Our method is based on the linear haze model and minimizes the gradient residual between the input and output images. This successfully suppresses any new artifacts that are not obvious in the input images. Second, we propose a new method for image inpainting using deep neural networks. Given a set of training data, deep generate models can generate high-quality natural images following the same distribution. We search the nearest neighbor in the latent space of the deep generate models using a weighted context loss and prior loss. This code is then converted to the clean and uncorrupted image of the input. Third, we study the problem of recovering high-quality images from very noisy raw data captured in low-light conditions with short exposures. We build deep neural networks to learn the camera processing pipeline specifically for low-light raw data with an extremely low signal-to-noise ratio (SNR). To train the networks, we capture a new dataset of more than five thousand images with short-exposed and long-exposed pairs. Promising results are obtained compared with the traditional image processing pipeline. Finally, we propose a new method for extreme-low light video processing. The raw video frames are pre-processed using spatial-temporal denoising. A neural network is trained to move the error in the pre-processed data, learning to perform the image processing pipeline and encourage temporal smoothness of the output. Both quantitative and qualitative results demonstrate the proposed method significantly outperform the existing methods. It also paves the way for future research on this area

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
    corecore