424 research outputs found

    Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

    Full text link
    During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models' performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.Comment: This is the extended version of the ICCV 2017 paper "Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes" with additional GTA experimen

    CRF Learning with CNN Features for Image Segmentation

    Full text link
    Conditional Random Rields (CRF) have been widely applied in image segmentations. While most studies rely on hand-crafted features, we here propose to exploit a pre-trained large convolutional neural network (CNN) to generate deep features for CRF learning. The deep CNN is trained on the ImageNet dataset and transferred to image segmentations here for constructing potentials of superpixels. Then the CRF parameters are learnt using a structured support vector machine (SSVM). To fully exploit context information in inference, we construct spatially related co-occurrence pairwise potentials and incorporate them into the energy function. This prefers labelling of object pairs that frequently co-occur in a certain spatial layout and at the same time avoids implausible labellings during the inference. Extensive experiments on binary and multi-class segmentation benchmarks demonstrate the promise of the proposed method. We thus provide new baselines for the segmentation performance on the Weizmann horse, Graz-02, MSRC-21, Stanford Background and PASCAL VOC 2011 datasets

    Learning Semantic Segmentation with Query Points Supervision on Aerial Images

    Full text link
    Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort.Comment: Paper presented at the LXCV workshop at ICCV 202

    Using Stacked Sparse Auto-Encoder and Superpixel CRF for Long-Term Visual Scene Understanding of UGVs

    Get PDF
    Multiple images have been widely used for scene understanding and navigation of unmanned ground vehicles in long term operations. However, as the amount of visual data in multiple images is huge, the cumulative error in many cases becomes untenable. This paper proposes a novel method that can extract features from a large dataset of multiple images efficiently. Then the membership K-means clustering is used for high dimensional features, and the large dataset is divided into N subdatasets to train N conditional random field (CRF) models based on superpixel. A Softmax subdataset selector is used to decide which one of the N CRF models is chosen as the prediction model for labeling images. Furthermore, some experiments are conducted to evaluate the feasibility and performance of the proposed approach

    Ensemble Unsupervised Semantic Segmentation For Foreground-Background Separation On Satellite Image

    Get PDF
    Recently, computer vision has been promoted by deep learning techniques significantly, where supervised deep learning outperformed other methods such as in image segmentation. However, a large amount of annotated/labeled data is needed for training supervised deep learning models, while such big annotated data is typically unavailable in practice such as in satellite imagery analytics. In order to address this challenge, a novel ensemble unsupervised semantic segmentation method was proposed for image segmentation on satellite images. Specifically, an unsupervised semantic segmentation model was employed to implement foreground- background separation and then be placed within an ensemble model to increase the prediction accuracy further. Experimental results demonstrated that the proposed method outperformed baseline models such as k-means on a satellite image benchmark, the XView2 dataset. The proposed approach provides a promising solution to semantic segmentation in images that will benefit many mission-critical applications such as disaster relief using satellite imagery analytics. Index Terms - Convolution neural network (CNNs); deep learning; ensemble model; image segmentation; overhead imagery; unsupervised learnin

    Understanding High Resolution Aerial Imagery Using Computer Vision Techniques

    Get PDF
    Computer vision can make important contributions to the analysis of remote sensing satellite or aerial imagery. However, the resolution of early satellite imagery was not sufficient to provide useful spatial features. The situation is changing with the advent of very-high-spatial-resolution (VHR) imaging sensors. This change makes it possible to use computer vision techniques to perform analysis of man-made structures. Meanwhile, the development of multi-view imaging techniques allows the generation of accurate point clouds as ancillary knowledge. This dissertation aims at developing computer vision and machine learning algorithms for high resolution aerial imagery analysis in the context of application problems including debris detection, building detection and roof condition assessment. High resolution aerial imagery and point clouds were provided by Pictometry International for this study. Debris detection after natural disasters such as tornadoes, hurricanes or tsunamis, is needed for effective debris removal and allocation of limited resources. Significant advances in aerial image acquisition have greatly enabled the possibilities for rapid and automated detection of debris. In this dissertation, a robust debris detection algorithm is proposed. Large scale aerial images are partitioned into homogeneous regions by interactive segmentation. Debris areas are identified based on extracted texture features. Robust building detection is another important part of high resolution aerial imagery understanding. This dissertation develops a 3D scene classification algorithm for building detection using point clouds derived from multi-view imagery. Point clouds are divided into point clusters using Euclidean clustering. Individual point clusters are identified based on extracted spectral and 3D structural features. The inspection of roof condition is an important step in damage claim processing in the insurance industry. Automated roof condition assessment from remotely sensed images is proposed in this dissertation. Initially, texture classification and a bag-of-words model were applied to assess the roof condition using features derived from the whole rooftop. However, considering the complexity of residential rooftop, a more sophisticated method is proposed to divide the task into two stages: 1) roof segmentation, followed by 2) classification of segmented roof regions. Deep learning techniques are investigated for both segmentation and classification. A deep learned feature is proposed and applied in a region merging segmentation algorithm. A fine-tuned deep network is adopted for roof segment classification and found to achieve higher accuracy than traditional methods using hand-crafted features. Contributions of this study include the development of algorithms for debris detection using 2D images and building detection using 3D point clouds. For roof condition assessment, the solutions to this problem are explored in two directions: features derived from the whole rooftop and features extracted from each roof segments. Through our research, roof segmentation followed by segments classification was found to be a more promising method and the workflow processing developed and tested. Deep learning techniques are also investigated for both roof segmentation and segments classification. More unsupervised feature extraction techniques using deep learning can be explored in future work
    • …
    corecore