892 research outputs found

    A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

    Full text link
    Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; excessive numbers of deep learning-based 2D semantic segmentation approaches have been proposed within the last decade. In this survey, we mainly focus on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images. We started with an analysis of the public image sets and leaderboards for 2D semantic segmantation, with an overview of the techniques employed in performance evaluation. In examining the evolution of the field, we chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era. We technically analysed the solutions put forward in terms of solving the fundamental problems of the field, such as fine-grained localisation and scale invariance. Before drawing our conclusions, we present a table of methods from all mentioned eras, with a brief summary of each approach that explains their contribution to the field. We conclude the survey by discussing the current challenges of the field and to what extent they have been solved.Comment: Updated with new studie

    Analysis of AI-Based Single-View 3D Reconstruction Methods for an Industrial Application

    Get PDF
    Machine learning (ML) is a key technology in smart manufacturing as it provides insights into complex processes without requiring deep domain expertise. This work deals with deep learning algorithms to determine a 3D reconstruction from a single 2D grayscale image. The potential of 3D reconstruction can be used for quality control because the height values contain relevant information that is not visible in 2D data. Instead of 3D scans, estimated depth maps based on a 2D input image can be used with the advantage of a simple setup and a short recording time. Determining a 3D reconstruction from a single input image is a difficult task for which many algorithms and methods have been proposed in the past decades. In this work, three deep learning methods, namely stacked autoencoder (SAE), generative adversarial networks (GANs) and U-Nets are investigated, evaluated and compared for 3D reconstruction from a 2D grayscale image of laser-welded components. In this work, different variants of GANs are tested, with the conclusion that Wasserstein GANs (WGANs) are the most robust approach among them. To the best of our knowledge, the present paper considers for the first time the U-Net, which achieves outstanding results in semantic segmentation, in the context of 3D reconstruction tasks. Unlike the U-Net, which uses standard convolutions, the stacked dilated U-Net (SDU-Net) applies stacked dilated convolutions. Of all the 3D reconstruction approaches considered in this work, the SDU-Net shows the best performance, not only in terms of evaluation metrics but also in terms of computation time. Due to the comparably small number of trainable parameters and the suitability of the architecture for strong data augmentation, a robust model can be generated with only a few training data

    Task Decomposition and Synchronization for Semantic Biomedical Image Segmentation

    Full text link
    Semantic segmentation is essentially important to biomedical image analysis. Many recent works mainly focus on integrating the Fully Convolutional Network (FCN) architecture with sophisticated convolution implementation and deep supervision. In this paper, we propose to decompose the single segmentation task into three subsequent sub-tasks, including (1) pixel-wise image segmentation, (2) prediction of the class labels of the objects within the image, and (3) classification of the scene the image belonging to. While these three sub-tasks are trained to optimize their individual loss functions of different perceptual levels, we propose to let them interact by the task-task context ensemble. Moreover, we propose a novel sync-regularization to penalize the deviation between the outputs of the pixel-wise segmentation and the class prediction tasks. These effective regularizations help FCN utilize context information comprehensively and attain accurate semantic segmentation, even though the number of the images for training may be limited in many biomedical applications. We have successfully applied our framework to three diverse 2D/3D medical image datasets, including Robotic Scene Segmentation Challenge 18 (ROBOT18), Brain Tumor Segmentation Challenge 18 (BRATS18), and Retinal Fundus Glaucoma Challenge (REFUGE18). We have achieved top-tier performance in all three challenges.Comment: IEEE Transactions on Medical Imagin

    ELASTIC: Improving CNNs with Dynamic Scaling Policies

    Full text link
    Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scaling policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective approach to learn a dynamic scale policy from data. We formulate the scaling policy as a non-linear function inside the network's structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture. We applied ELASTIC to several state-of-the-art network architectures and showed consistent improvement without extra (sometimes even lower) computation on ImageNet classification, MSCOCO multi-label classification, and PASCAL VOC semantic segmentation. Our results show major improvement for images with scale challenges. Our code is available here: https://github.com/allenai/elasticComment: CVPR 2019 oral, code available https://github.com/allenai/elasti
    • …
    corecore