591 research outputs found

    Coarse-to-Fine Annotation Enrichment for Semantic Segmentation Learning

    Full text link
    Rich high-quality annotated data is critical for semantic segmentation learning, yet acquiring dense and pixel-wise ground-truth is both labor- and time-consuming. Coarse annotations (e.g., scribbles, coarse polygons) offer an economical alternative, with which training phase could hardly generate satisfactory performance unfortunately. In order to generate high-quality annotated data with a low time cost for accurate segmentation, in this paper, we propose a novel annotation enrichment strategy, which expands existing coarse annotations of training data to a finer scale. Extensive experiments on the Cityscapes and PASCAL VOC 2012 benchmarks have shown that the neural networks trained with the enriched annotations from our framework yield a significant improvement over that trained with the original coarse labels. It is highly competitive to the performance obtained by using human annotated dense annotations. The proposed method also outperforms among other state-of-the-art weakly-supervised segmentation methods.Comment: CIKM 2018 International Conference on Information and Knowledge Managemen

    Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding

    Get PDF
    Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. In this work, we are interested in understanding the roles of these different tasks in improved scene understanding, in particular semantic segmentation, object detection and scene recognition. Towards this goal, we "plug-in" human subjects for each of the various components in a state-of-the-art conditional random field model. Comparisons among various hybrid human-machine CRFs give us indications of how much "head room" there is to improve scene understanding by focusing research efforts on various individual tasks

    Semantic-Aware Image Analysis

    Get PDF
    Extracting and utilizing high-level semantic information from images is one of the important goals of computer vision. The ultimate objective of image analysis is to be able to understand each pixel of an image with regard to high-level semantics, e.g. the objects, the stuff, and their spatial, functional and semantic relations. In recent years, thanks to large labeled datasets and deep learning, great progress has been made to solve image analysis problems, such as image classification, object detection, and object pose estimation. In this work, we explore several aspects of semantic-aware image analysis. First, we explore semantic segmentation of man-made scenes using fully connected conditional random fields which can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. Second, we introduce a semantic smoothing method by exploiting the semantic information to accomplish semantic structure-preserving image smoothing. Semantic segmentation has achieved significant progress recently and has been widely used in many computer vision tasks. We observe that high-level semantic image labeling information can provide a meaningful structure prior to image smoothing naturally. Third, we present a deep object co-segmentation approach for segmenting common objects of the same class within a pair of images. To address this task, we propose a CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level semantic features of the foreground objects, a mutual correlation layer detects the common objects, and finally, the decoder generates the output foreground masks for each image. Finally, we propose an approach to localize common objects from novel object categories in a set of images. We solve this problem using a new common component activation map in which we treat the class-specific activation maps as components to discover the common components in the image set. We show that our approach can generalize on novel object categories in our experiments

    Semantic Object Parsing with Graph LSTM

    Full text link
    By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data. Particularly, instead of evenly and fixedly dividing an image to pixels or patches in existing multi-dimensional LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each arbitrary-shaped superpixel as a semantically consistent node, and adaptively construct an undirected graph for each image, where the spatial relations of the superpixels are naturally used as edges. Constructed on such an adaptive graph topology, the Graph LSTM is more naturally aligned with the visual patterns in the image (e.g., object boundaries or appearance similarities) and provides a more economical information propagation route. Furthermore, for each optimization step over Graph LSTM, we propose to use a confidence-driven scheme to update the hidden and memory states of nodes progressively till all nodes are updated. In addition, for each node, the forgets gates are adaptively learned to capture different degrees of semantic correlation with neighboring nodes. Comprehensive evaluations on four diverse semantic object parsing datasets well demonstrate the significant superiority of our Graph LSTM over other state-of-the-art solutions.Comment: 18 page
    corecore