Search CORE

591 research outputs found

Coarse-to-Fine Annotation Enrichment for Semantic Segmentation Learning

Author: Bearman Amy L.
Bertasius Gedas
Chandra Siddhartha
Cordts Marius
Lin Di
Muja Marius
Papadopoulos Dim P.
Qi Xiaojuan
Shen Falong
Shen Fumin
Zhang Ke
Zhao Hengshuang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Rich high-quality annotated data is critical for semantic segmentation learning, yet acquiring dense and pixel-wise ground-truth is both labor- and time-consuming. Coarse annotations (e.g., scribbles, coarse polygons) offer an economical alternative, with which training phase could hardly generate satisfactory performance unfortunately. In order to generate high-quality annotated data with a low time cost for accurate segmentation, in this paper, we propose a novel annotation enrichment strategy, which expands existing coarse annotations of training data to a finer scale. Extensive experiments on the Cityscapes and PASCAL VOC 2012 benchmarks have shown that the neural networks trained with the enriched annotations from our framework yield a significant improvement over that trained with the original coarse labels. It is highly competitive to the performance obtained by using human annotated dense annotations. The proposed method also outperforms among other state-of-the-art weakly-supervised segmentation methods.Comment: CIKM 2018 International Conference on Information and Knowledge Managemen

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding

Author: Fidler Sanja
Mottaghi Roozbeh
Parikh Devi
Urtasun Raquel
Yuille Alan
Publication venue
Publication date: 15/06/2014
Field of study

Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. In this work, we are interested in understanding the roles of these different tasks in improved scene understanding, in particular semantic segmentation, object detection and scene recognition. Towards this goal, we "plug-in" human subjects for each of the various components in a state-of-the-art conditional random field model. Comparisons among various hybrid human-machine CRFs give us indications of how much "head room" there is to improve scene understanding by focusing research efforts on various individual tasks

arXiv.org e-Print Archive

DSpace@MIT

Semantic-Aware Image Analysis

Author: Li Weihao
Publication venue
Publication date: 01/01/2020
Field of study

Extracting and utilizing high-level semantic information from images is one of the important goals of computer vision. The ultimate objective of image analysis is to be able to understand each pixel of an image with regard to high-level semantics, e.g. the objects, the stuff, and their spatial, functional and semantic relations. In recent years, thanks to large labeled datasets and deep learning, great progress has been made to solve image analysis problems, such as image classification, object detection, and object pose estimation. In this work, we explore several aspects of semantic-aware image analysis. First, we explore semantic segmentation of man-made scenes using fully connected conditional random fields which can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. Second, we introduce a semantic smoothing method by exploiting the semantic information to accomplish semantic structure-preserving image smoothing. Semantic segmentation has achieved significant progress recently and has been widely used in many computer vision tasks. We observe that high-level semantic image labeling information can provide a meaningful structure prior to image smoothing naturally. Third, we present a deep object co-segmentation approach for segmenting common objects of the same class within a pair of images. To address this task, we propose a CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level semantic features of the foreground objects, a mutual correlation layer detects the common objects, and finally, the decoder generates the output foreground masks for each image. Finally, we propose an approach to localize common objects from novel object categories in a set of images. We solve this problem using a new common component activation map in which we treat the class-specific activation maps as components to discover the common components in the image set. We show that our approach can generalize on novel object categories in our experiments

Heidelberger Dokumentenserver

Semantic Object Parsing with Graph LSTM

Author: A Graves
E Simo-Serra
F Xia
S Hochreiter
X Liang
Y Wang
Publication venue
Publication date: 22/03/2016
Field of study

By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data. Particularly, instead of evenly and fixedly dividing an image to pixels or patches in existing multi-dimensional LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each arbitrary-shaped superpixel as a semantically consistent node, and adaptively construct an undirected graph for each image, where the spatial relations of the superpixels are naturally used as edges. Constructed on such an adaptive graph topology, the Graph LSTM is more naturally aligned with the visual patterns in the image (e.g., object boundaries or appearance similarities) and provides a more economical information propagation route. Furthermore, for each optimization step over Graph LSTM, we propose to use a confidence-driven scheme to update the hidden and memory states of nodes progressively till all nodes are updated. In addition, for each node, the forgets gates are adaptively learned to capture different degrees of semantic correlation with neighboring nodes. Comprehensive evaluations on four diverse semantic object parsing datasets well demonstrate the significant superiority of our Graph LSTM over other state-of-the-art solutions.Comment: 18 page

arXiv.org e-Print Archive

Crossref