3,239 research outputs found
Semantic segmentation priors for object discovery
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reliable object discovery in realistic indoor scenes is a necessity for many computer vision and service robot applications. In these scenes, semantic segmentation methods have made huge advances in recent years. Such methods can provide useful prior information for object discovery by removing false positives and by delineating object boundaries. We propose a novel method that combines bottom-up object discovery and semantic priors for producing generic object candidates in RGB-D images. We use a deep learning method for semantic segmentation to classify colour and depth superpixels into meaningful categories. Separately for each category, we use saliency to estimate the location and scale of objects, and superpixels to find their precise boundaries. Finally, object candidates of all categories are combined and ranked. We evaluate our approach on the NYU Depth V2 dataset and show that we outperform other state-of-the-art object discovery methods in terms of recall.Peer ReviewedPostprint (author's final draft
Generalized Category Discovery in Semantic Segmentation
This paper explores a novel setting called Generalized Category Discovery in
Semantic Segmentation (GCDSS), aiming to segment unlabeled images given prior
knowledge from a labeled set of base classes. The unlabeled images contain
pixels of the base class or novel class. In contrast to Novel Category
Discovery in Semantic Segmentation (NCDSS), there is no prerequisite for prior
knowledge mandating the existence of at least one novel class in each unlabeled
image. Besides, we broaden the segmentation scope beyond foreground objects to
include the entire image. Existing NCDSS methods rely on the aforementioned
priors, making them challenging to truly apply in real-world situations. We
propose a straightforward yet effective framework that reinterprets the GCDSS
challenge as a task of mask classification. Additionally, we construct a
baseline method and introduce the Neighborhood Relations-Guided Mask Clustering
Algorithm (NeRG-MaskCA) for mask categorization to address the fragmentation in
semantic representation. A benchmark dataset, Cityscapes-GCD, derived from the
Cityscapes dataset, is established to evaluate the GCDSS framework. Our method
demonstrates the feasibility of the GCDSS problem and the potential for
discovering and segmenting novel object classes in unlabeled images. We employ
the generated pseudo-labels from our approach as ground truth to supervise the
training of other models, thereby enabling them with the ability to segment
novel classes. It paves the way for further research in generalized category
discovery, broadening the horizons of semantic segmentation and its
applications. For details, please visit https://github.com/JethroPeng/GCDS
From Image-level to Pixel-level Labeling with Convolutional Networks
We are interested in inferring object segmentation by leveraging only object
class information, and by considering only minimal priors on the object
segmentation task. This problem could be viewed as a kind of weakly supervised
segmentation task, and naturally fits the Multiple Instance Learning (MIL)
framework: every training image is known to have (or not) at least one pixel
corresponding to the image class label, and the segmentation task can be
rewritten as inferring the pixels belonging to the class of the object (given
one image, and its object class). We propose a Convolutional Neural
Network-based model, which is constrained during training to put more weight on
pixels which are important for classifying the image. We show that at test
time, the model has learned to discriminate the right pixels well enough, such
that it performs very well on an existing segmentation benchmark, by adding
only few smoothing priors. Our system is trained using a subset of the Imagenet
dataset and the segmentation experiments are performed on the challenging
Pascal VOC dataset (with no fine-tuning of the model on Pascal VOC). Our model
beats the state of the art results in weakly supervised object segmentation
task by a large margin. We also compare the performance of our model with state
of the art fully-supervised segmentation approaches.Comment: CVPR201
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
We propose an approach to discover class-specific pixels for the
weakly-supervised semantic segmentation task. We show that properly combining
saliency and attention maps allows us to obtain reliable cues capable of
significantly boosting the performance. First, we propose a simple yet powerful
hierarchical approach to discover the class-agnostic salient regions, obtained
using a salient object detector, which otherwise would be ignored. Second, we
use fully convolutional attention maps to reliably localize the class-specific
regions in a given image. We combine these two cues to discover class-specific
pixels which are then used as an approximate ground truth for training a CNN.
While solving the weakly supervised semantic segmentation task, we ensure that
the image-level classification task is also solved in order to enforce the CNN
to assign at least one pixel to each object present in the image.
Experimentally, on the PASCAL VOC12 val and test sets, we obtain the mIoU of
60.8% and 61.9%, achieving the performance gains of 5.1% and 5.2% compared to
the published state-of-the-art results. The code is made publicly available
- …