336 research outputs found
Annotating Object Instances with a Polygon-RNN
We propose an approach for semi-automatic annotation of object instances.
While most current methods treat object segmentation as a pixel-labeling
problem, we here cast it as a polygon prediction task, mimicking how most
current datasets have been annotated. In particular, our approach takes as
input an image crop and sequentially produces vertices of the polygon outlining
the object. This allows a human annotator to interfere at any time and correct
a vertex if needed, producing as accurate segmentation as desired by the
annotator. We show that our approach speeds up the annotation process by a
factor of 4.7 across all classes in Cityscapes, while achieving 78.4% agreement
in IoU with original ground-truth, matching the typical agreement between human
annotators. For cars, our speed-up factor is 7.3 for an agreement of 82.2%. We
further show generalization capabilities of our approach to unseen datasets
Distance to Center of Mass Encoding for Instance Segmentation
The instance segmentation can be considered an extension of the object
detection problem where bounding boxes are replaced by object contours.
Strictly speaking the problem requires to identify each pixel instance and
class independently of the artifice used for this mean. The advantage of
instance segmentation over the usual object detection lies in the precise
delineation of objects improving object localization. Additionally, object
contours allow the evaluation of partial occlusion with basic image processing
algorithms. This work approaches the instance segmentation problem as an
annotation problem and presents a novel technique to encode and decode ground
truth annotations. We propose a mathematical representation of instances that
any deep semantic segmentation model can learn and generalize. Each individual
instance is represented by a center of mass and a field of vectors pointing to
it. This encoding technique has been denominated Distance to Center of Mass
Encoding (DCME)
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
In this work, we tackle the problem of instance segmentation, the task of
simultaneously solving object detection and semantic segmentation. Towards this
goal, we present a model, called MaskLab, which produces three outputs: box
detection, semantic segmentation, and direction prediction. Building on top of
the Faster-RCNN object detector, the predicted boxes provide accurate
localization of object instances. Within each region of interest, MaskLab
performs foreground/background segmentation by combining semantic and direction
prediction. Semantic segmentation assists the model in distinguishing between
objects of different semantic classes including background, while the direction
prediction, estimating each pixel's direction towards its corresponding center,
allows separating instances of the same semantic class. Moreover, we explore
the effect of incorporating recent successful methods from both segmentation
and detection (i.e. atrous convolution and hypercolumn). Our proposed model is
evaluated on the COCO instance segmentation benchmark and shows comparable
performance with other state-of-art models.Comment: 10 pages including referenc
Hybridnet for depth estimation and semantic segmentation
Semantic segmentation and depth estimation are two important tasks in the
area of image processing. Traditionally, these two tasks are addressed in an
independent manner. However, for those applications where geometric and
semantic information is required, such as robotics or autonomous
navigation,depth or semantic segmentation alone are not sufficient. In this
paper, depth estimation and semantic segmentation are addressed together from a
single input image through a hybrid convolutional network. Different from the
state of the art methods where features are extracted by a sole feature
extraction network for both tasks, the proposed HybridNet improves the features
extraction by separating the relevant features for one task from those which
are relevant for both. Experimental results demonstrate that HybridNet results
are comparable with the state of the art methods, as well as the single task
methods that HybridNet is based on.Comment: 2018 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 201
Depth estimation and semantic segmentation from a single RGB image using a hybrid convolutional neural network
Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.Peer ReviewedPostprint (author's final draft
- …