1,048 research outputs found
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
Semantic Cross-View Matching
Matching cross-view images is challenging because the appearance and
viewpoints are significantly different. While low-level features based on
gradient orientations or filter responses can drastically vary with such
changes in viewpoint, semantic information of images however shows an invariant
characteristic in this respect. Consequently, semantically labeled regions can
be used for performing cross-view matching. In this paper, we therefore explore
this idea and propose an automatic method for detecting and representing the
semantic information of an RGB image with the goal of performing cross-view
matching with a (non-RGB) geographic information system (GIS). A segmented
image forms the input to our system with segments assigned to semantic concepts
such as traffic signs, lakes, roads, foliage, etc. We design a descriptor to
robustly capture both, the presence of semantic concepts and the spatial layout
of those segments. Pairwise distances between the descriptors extracted from
the GIS map and the query image are then used to generate a shortlist of the
most promising locations with similar semantic concepts in a consistent spatial
layout. An experimental evaluation with challenging query images and a large
urban area shows promising results
Explainable Artificial Intelligence for Image Segmentation and for Estimation of Optical Aberrations
State-of-the-art machine learning methods such as convolutional neural networks (CNNs) are frequently employed in computer vision. Despite their high performance on unseen data, CNNs are often criticized for lacking transparency — that is, providing very limited if any information about the internal decision-making process. In some applications, especially in healthcare, such transparency of algorithms is crucial for end users, as trust in diagnosis and prognosis is important not only for the satisfaction and potential adherence of patients, but also for their health. Explainable artificial intelligence (XAI) aims to open up this “black box,” often perceived as a cryptic and inconceivable algorithm, to increase understanding of the machines’ reasoning.XAI is an emerging field, and techniques for making machine learning explainable are becoming increasingly available. XAI for computer vision mainly focuses on image classification, whereas interpretability in other tasks remains challenging. Here, I examine explainability in computer vision beyond image classification, namely in semantic segmentation and 3D multitarget image regression.
This thesis consists of five chapters.
In Chapter 1 (Introduction), the background of artificial intelligence (AI), XAI, computer vision, and optics is presented, and the definitions of the terminology for XAI are proposed.
Chapter 2 is focused on explaining the predictions of U-Net, a CNN commonly used for semantic image segmentation, and variations of this architecture. To this end, I propose the gradient-weighted class activation mapping for segmentation (Seg-Grad-CAM) method based on the well-known Grad-CAM method for explainable image classification.
In Chapter 3, I present the application of deep learning to estimation of optical aberrations in microscopy biodata by identifying the present Zernike aberration modes and their amplitudes. A CNN-based approach PhaseNet can accurately estimate monochromatic aberrations in images of point light sources. I extend this method to objects of complex shapes.
In Chapter 4, an approach for explainable 3D multitarget image regression is reported. First, I visualize how the model differentiates the aberration modes using the local interpretable model-agnostic explanations (LIME) method adapted for 3D image classification. Then I “explain,” using LIME modified for multitarget 3D image regression (Image-Reg-LIME), the outputs of the regression model for estimation of the amplitudes.
In Chapter 5, the results are discussed in a broader context.
The contribution of this thesis is the development of explainability methods for semantic segmentation and 3D multitarget image regression of optical aberrations. The research opens the door for further enhancement of AI’s transparency.:Title Page i
List of Figures xi
List of Tables xv
1 Introduction 1
1.1 Essential Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Explainable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Proposed definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Explainable Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Aims and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Image classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Image regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Aberrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Zernike polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.2 Dissertation outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Explainable Image Segmentation 23
2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.4 Seg-Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.1 Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.2 TextureMNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.3 Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Estimation of Aberrations 55
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.1 PhaseNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.2 PhaseNet data generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.3 Retrieval of noise parameters . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.4 Data generator with phantoms . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.5 Restoration via deconvolution . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.6 Convolution with the “zero” synthetic PSF . . . . . . . . . . . . . . . . 63
3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.1 Astrocytes (synthetic data) . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.2 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.3 Drosophila embryo (live sample) . . . . . . . . . . . . . . . . . . . . . . 67
3.4.4 Neurons (fixed sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.1 Astrocytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.2 Conclusions on the results for astrocytes . . . . . . . . . . . . . . . . . . 74
3.5.3 Fluorescent beads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.4 Conclusions on the results for fluorescent beads . . . . . . . . . . . . . . 81
3.5.5 Drosophila embryo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5.6 Conclusions on the results for Drosophila embryo . . . . . . . . . . . . . 87
3.5.7 Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4 Explainable Multitarget Image Regression 99
4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3.1 LIME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3.2 Superpixel algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.3 LIME for 3D image classification . . . . . . . . . . . . . . . . . . . . . . 104
4.3.4 Image-Reg-LIME: LIME for 3D image regression . . . . . . . . . . . . . 107
4.4 Results: Classification of Aberrations . . . . . . . . . . . . . . . . . . . . . . . . 109
viii
TABLE OF CONTENTS
4.4.1 Transforming the regression task into classification . . . . . . . . . . . . 110
4.4.2 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4.3 Parameter search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.4 Clustering of 3D images . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.4.5 Explanations of classification . . . . . . . . . . . . . . . . . . . . . . . . 114
4.4.6 Conclusions on the results for classification . . . . . . . . . . . . . . . . 117
4.5 Results: Explainable Regression of Aberrations . . . . . . . . . . . . . . . . . . 118
4.5.1 Explanations with a reference value . . . . . . . . . . . . . . . . . . . . 121
4.5.2 Validation of explanations . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5 Conclusions and Outlook 127
References 12
DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels
In the context of scene understanding, a variety of methods exists to
estimate different information channels from mono or stereo images, including
disparity, depth, and normals. Although several advances have been reported in
the recent years for these tasks, the estimated information is often imprecise
particularly near depth discontinuities or creases. Studies have however shown
that precisely such depth edges carry critical cues for the perception of
shape, and play important roles in tasks like depth-based segmentation or
foreground selection. Unfortunately, the currently extracted channels often
carry conflicting signals, making it difficult for subsequent applications to
effectively use them. In this paper, we focus on the problem of obtaining
high-precision depth edges (i.e., depth contours and creases) by jointly
analyzing such unreliable information channels. We propose DepthCut, a
data-driven fusion of the channels using a convolutional neural network trained
on a large dataset with known depth. The resulting depth edges can be used for
segmentation, decomposing a scene into depth layers with relatively flat depth,
or improving the accuracy of the depth estimate near depth edges by
constraining its gradients to agree with these edges. Quantitatively, we
compare against 15 variants of baselines and demonstrate that our depth edges
result in an improved segmentation performance and an improved depth estimate
near depth edges compared to data-agnostic channel fusion. Qualitatively, we
demonstrate that the depth edges result in superior segmentation and depth
orderings.Comment: 12 page
Bidirectional multi-scale attention networks for semantic segmentation of oblique UAV imagery
Semantic segmentation for aerial platforms has been one of the fundamental
scene understanding task for the earth observation. Most of the semantic
segmentation research focused on scenes captured in nadir view, in which
objects have relatively smaller scale variation compared with scenes captured
in oblique view. The huge scale variation of objects in oblique images limits
the performance of deep neural networks (DNN) that process images in a single
scale fashion. In order to tackle the scale variation issue, in this paper, we
propose the novel bidirectional multi-scale attention networks, which fuse
features from multiple scales bidirectionally for more adaptive and effective
feature extraction. The experiments are conducted on the UAVid2020 dataset and
have shown the effectiveness of our method. Our model achieved the
state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score
of 70.80%
WTA/TLA: A UAV-captured dataset for semantic segmentation of energy infrastructure
Automated inspection of energy infrastructure with Unmanned Aerial Vehicles (UAVs) is becoming increasingly important, exhibiting significant advantages over manual inspection, including improved scalability, cost/time effectiveness, and risks reduction. Although recent technological advancements enabled the collection of an abundance of vision data from UAVs’ sensors, significant efforts are still required from experts to interpret manually the collected data and assess the condition of energy infrastructure. Thus, semantic understanding of vision data collected from UAVs during inspection is a critical prerequisite for performing autonomous robotic tasks. However, the lack of labeled data introduces challenges and limitations in evaluating the performance of semantic prediction algorithms. To this end, we release two novel semantic datasets (WTA and TLA) of aerial images captured from power transmission networks and wind turbine farms, collected during real inspection scenarios with UAVs. We also propose modifications to existing state-of-the-art semantic segmentation CNNs to achieve improved trade-off between accuracy and computational complexity. Qualitative and quantitative experiments demonstrate both the challenging properties of the provided dataset and the effectiveness of the proposed networks in this domain.The dataset is available at: https://github.com/gzamps/wta_tla_dataset
- …