892 research outputs found
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
Analysis of AI-Based Single-View 3D Reconstruction Methods for an Industrial Application
Machine learning (ML) is a key technology in smart manufacturing as it provides insights into complex processes without requiring deep domain expertise. This work deals with deep learning algorithms to determine a 3D reconstruction from a single 2D grayscale image. The potential of 3D reconstruction can be used for quality control because the height values contain relevant information that is not visible in 2D data. Instead of 3D scans, estimated depth maps based on a 2D input image can be used with the advantage of a simple setup and a short recording time. Determining a 3D reconstruction from a single input image is a difficult task for which many algorithms and methods have been proposed in the past decades. In this work, three deep learning methods, namely stacked autoencoder (SAE), generative adversarial networks (GANs) and U-Nets are investigated, evaluated and compared for 3D reconstruction from a 2D grayscale image of laser-welded components. In this work, different variants of GANs are tested, with the conclusion that Wasserstein GANs (WGANs) are the most robust approach among them. To the best of our knowledge, the present paper considers for the first time the U-Net, which achieves outstanding results in semantic segmentation, in the context of 3D reconstruction tasks. Unlike the U-Net, which uses standard convolutions, the stacked dilated U-Net (SDU-Net) applies stacked dilated convolutions. Of all the 3D reconstruction approaches considered in this work, the SDU-Net shows the best performance, not only in terms of evaluation metrics but also in terms of computation time. Due to the comparably small number of trainable parameters and the suitability of the architecture for strong data augmentation, a robust model can be generated with only a few training data
Task Decomposition and Synchronization for Semantic Biomedical Image Segmentation
Semantic segmentation is essentially important to biomedical image analysis.
Many recent works mainly focus on integrating the Fully Convolutional Network
(FCN) architecture with sophisticated convolution implementation and deep
supervision. In this paper, we propose to decompose the single segmentation
task into three subsequent sub-tasks, including (1) pixel-wise image
segmentation, (2) prediction of the class labels of the objects within the
image, and (3) classification of the scene the image belonging to. While these
three sub-tasks are trained to optimize their individual loss functions of
different perceptual levels, we propose to let them interact by the task-task
context ensemble. Moreover, we propose a novel sync-regularization to penalize
the deviation between the outputs of the pixel-wise segmentation and the class
prediction tasks. These effective regularizations help FCN utilize context
information comprehensively and attain accurate semantic segmentation, even
though the number of the images for training may be limited in many biomedical
applications. We have successfully applied our framework to three diverse 2D/3D
medical image datasets, including Robotic Scene Segmentation Challenge 18
(ROBOT18), Brain Tumor Segmentation Challenge 18 (BRATS18), and Retinal Fundus
Glaucoma Challenge (REFUGE18). We have achieved top-tier performance in all
three challenges.Comment: IEEE Transactions on Medical Imagin
ELASTIC: Improving CNNs with Dynamic Scaling Policies
Scale variation has been a challenge from traditional to modern approaches in
computer vision. Most solutions to scale issues have a similar theme: a set of
intuitive and manually designed policies that are generic and fixed (e.g. SIFT
or feature pyramid). We argue that the scaling policy should be learned from
data. In this paper, we introduce ELASTIC, a simple, efficient and yet very
effective approach to learn a dynamic scale policy from data. We formulate the
scaling policy as a non-linear function inside the network's structure that (a)
is learned from data, (b) is instance specific, (c) does not add extra
computation, and (d) can be applied on any network architecture. We applied
ELASTIC to several state-of-the-art network architectures and showed consistent
improvement without extra (sometimes even lower) computation on ImageNet
classification, MSCOCO multi-label classification, and PASCAL VOC semantic
segmentation. Our results show major improvement for images with scale
challenges. Our code is available here: https://github.com/allenai/elasticComment: CVPR 2019 oral, code available https://github.com/allenai/elasti
- …