1,048 research outputs found

    Dynamic scene understanding using deep neural networks

    Get PDF

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Full text link
    Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation

    Semantic Cross-View Matching

    Full text link
    Matching cross-view images is challenging because the appearance and viewpoints are significantly different. While low-level features based on gradient orientations or filter responses can drastically vary with such changes in viewpoint, semantic information of images however shows an invariant characteristic in this respect. Consequently, semantically labeled regions can be used for performing cross-view matching. In this paper, we therefore explore this idea and propose an automatic method for detecting and representing the semantic information of an RGB image with the goal of performing cross-view matching with a (non-RGB) geographic information system (GIS). A segmented image forms the input to our system with segments assigned to semantic concepts such as traffic signs, lakes, roads, foliage, etc. We design a descriptor to robustly capture both, the presence of semantic concepts and the spatial layout of those segments. Pairwise distances between the descriptors extracted from the GIS map and the query image are then used to generate a shortlist of the most promising locations with similar semantic concepts in a consistent spatial layout. An experimental evaluation with challenging query images and a large urban area shows promising results

    Explainable Artificial Intelligence for Image Segmentation and for Estimation of Optical Aberrations

    Get PDF
    State-of-the-art machine learning methods such as convolutional neural networks (CNNs) are frequently employed in computer vision. Despite their high performance on unseen data, CNNs are often criticized for lacking transparency — that is, providing very limited if any information about the internal decision-making process. In some applications, especially in healthcare, such transparency of algorithms is crucial for end users, as trust in diagnosis and prognosis is important not only for the satisfaction and potential adherence of patients, but also for their health. Explainable artificial intelligence (XAI) aims to open up this “black box,” often perceived as a cryptic and inconceivable algorithm, to increase understanding of the machines’ reasoning.XAI is an emerging field, and techniques for making machine learning explainable are becoming increasingly available. XAI for computer vision mainly focuses on image classification, whereas interpretability in other tasks remains challenging. Here, I examine explainability in computer vision beyond image classification, namely in semantic segmentation and 3D multitarget image regression. This thesis consists of five chapters. In Chapter 1 (Introduction), the background of artificial intelligence (AI), XAI, computer vision, and optics is presented, and the definitions of the terminology for XAI are proposed. Chapter 2 is focused on explaining the predictions of U-Net, a CNN commonly used for semantic image segmentation, and variations of this architecture. To this end, I propose the gradient-weighted class activation mapping for segmentation (Seg-Grad-CAM) method based on the well-known Grad-CAM method for explainable image classification. In Chapter 3, I present the application of deep learning to estimation of optical aberrations in microscopy biodata by identifying the present Zernike aberration modes and their amplitudes. A CNN-based approach PhaseNet can accurately estimate monochromatic aberrations in images of point light sources. I extend this method to objects of complex shapes. In Chapter 4, an approach for explainable 3D multitarget image regression is reported. First, I visualize how the model differentiates the aberration modes using the local interpretable model-agnostic explanations (LIME) method adapted for 3D image classification. Then I “explain,” using LIME modified for multitarget 3D image regression (Image-Reg-LIME), the outputs of the regression model for estimation of the amplitudes. In Chapter 5, the results are discussed in a broader context. The contribution of this thesis is the development of explainability methods for semantic segmentation and 3D multitarget image regression of optical aberrations. The research opens the door for further enhancement of AI’s transparency.:Title Page i List of Figures xi List of Tables xv 1 Introduction 1 1.1 Essential Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Explainable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Proposed definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Explainable Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Aims and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Image classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.3 Image regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.4 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Aberrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Zernike polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.2 Dissertation outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Explainable Image Segmentation 23 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.3 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.4 Seg-Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3 Estimation of Aberrations 55 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 PhaseNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.2 PhaseNet data generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.3 Retrieval of noise parameters . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.4 Data generator with phantoms . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.5 Restoration via deconvolution . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.6 Convolution with the “zero” synthetic PSF . . . . . . . . . . . . . . . . 63 3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.1 Astrocytes (synthetic data) . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.2 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.3 Drosophila embryo (live sample) . . . . . . . . . . . . . . . . . . . . . . 67 3.4.4 Neurons (fixed sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.1 Astrocytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5.2 Conclusions on the results for astrocytes . . . . . . . . . . . . . . . . . . 74 3.5.3 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5.4 Conclusions on the results for fluorescent beads . . . . . . . . . . . . . . 81 3.5.5 Drosophila embryo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.5.6 Conclusions on the results for Drosophila embryo . . . . . . . . . . . . . 87 3.5.7 Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4 Explainable Multitarget Image Regression 99 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.1 LIME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3.2 Superpixel algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3.3 LIME for 3D image classification . . . . . . . . . . . . . . . . . . . . . . 104 4.3.4 Image-Reg-LIME: LIME for 3D image regression . . . . . . . . . . . . . 107 4.4 Results: Classification of Aberrations . . . . . . . . . . . . . . . . . . . . . . . . 109 viii TABLE OF CONTENTS 4.4.1 Transforming the regression task into classification . . . . . . . . . . . . 110 4.4.2 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4.3 Parameter search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4.4 Clustering of 3D images . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4.5 Explanations of classification . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4.6 Conclusions on the results for classification . . . . . . . . . . . . . . . . 117 4.5 Results: Explainable Regression of Aberrations . . . . . . . . . . . . . . . . . . 118 4.5.1 Explanations with a reference value . . . . . . . . . . . . . . . . . . . . 121 4.5.2 Validation of explanations . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5 Conclusions and Outlook 127 References 12

    DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels

    Get PDF
    In the context of scene understanding, a variety of methods exists to estimate different information channels from mono or stereo images, including disparity, depth, and normals. Although several advances have been reported in the recent years for these tasks, the estimated information is often imprecise particularly near depth discontinuities or creases. Studies have however shown that precisely such depth edges carry critical cues for the perception of shape, and play important roles in tasks like depth-based segmentation or foreground selection. Unfortunately, the currently extracted channels often carry conflicting signals, making it difficult for subsequent applications to effectively use them. In this paper, we focus on the problem of obtaining high-precision depth edges (i.e., depth contours and creases) by jointly analyzing such unreliable information channels. We propose DepthCut, a data-driven fusion of the channels using a convolutional neural network trained on a large dataset with known depth. The resulting depth edges can be used for segmentation, decomposing a scene into depth layers with relatively flat depth, or improving the accuracy of the depth estimate near depth edges by constraining its gradients to agree with these edges. Quantitatively, we compare against 15 variants of baselines and demonstrate that our depth edges result in an improved segmentation performance and an improved depth estimate near depth edges compared to data-agnostic channel fusion. Qualitatively, we demonstrate that the depth edges result in superior segmentation and depth orderings.Comment: 12 page

    Bidirectional multi-scale attention networks for semantic segmentation of oblique UAV imagery

    Get PDF
    Semantic segmentation for aerial platforms has been one of the fundamental scene understanding task for the earth observation. Most of the semantic segmentation research focused on scenes captured in nadir view, in which objects have relatively smaller scale variation compared with scenes captured in oblique view. The huge scale variation of objects in oblique images limits the performance of deep neural networks (DNN) that process images in a single scale fashion. In order to tackle the scale variation issue, in this paper, we propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction. The experiments are conducted on the UAVid2020 dataset and have shown the effectiveness of our method. Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%

    WTA/TLA: A UAV-captured dataset for semantic segmentation of energy infrastructure

    Get PDF
    Automated inspection of energy infrastructure with Unmanned Aerial Vehicles (UAVs) is becoming increasingly important, exhibiting significant advantages over manual inspection, including improved scalability, cost/time effectiveness, and risks reduction. Although recent technological advancements enabled the collection of an abundance of vision data from UAVs’ sensors, significant efforts are still required from experts to interpret manually the collected data and assess the condition of energy infrastructure. Thus, semantic understanding of vision data collected from UAVs during inspection is a critical prerequisite for performing autonomous robotic tasks. However, the lack of labeled data introduces challenges and limitations in evaluating the performance of semantic prediction algorithms. To this end, we release two novel semantic datasets (WTA and TLA) of aerial images captured from power transmission networks and wind turbine farms, collected during real inspection scenarios with UAVs. We also propose modifications to existing state-of-the-art semantic segmentation CNNs to achieve improved trade-off between accuracy and computational complexity. Qualitative and quantitative experiments demonstrate both the challenging properties of the provided dataset and the effectiveness of the proposed networks in this domain.The dataset is available at: https://github.com/gzamps/wta_tla_dataset
    • …