7,024 research outputs found
Impact of Ground Truth Annotation Quality on Performance of Semantic Image Segmentation of Traffic Conditions
Preparation of high-quality datasets for the urban scene understanding is a
labor-intensive task, especially, for datasets designed for the autonomous
driving applications. The application of the coarse ground truth (GT)
annotations of these datasets without detriment to the accuracy of semantic
image segmentation (by the mean intersection over union - mIoU) could simplify
and speedup the dataset preparation and model fine tuning before its practical
application. Here the results of the comparative analysis for semantic
segmentation accuracy obtained by PSPNet deep learning architecture are
presented for fine and coarse annotated images from Cityscapes dataset. Two
scenarios were investigated: scenario 1 - the fine GT images for training and
prediction, and scenario 2 - the fine GT images for training and the coarse GT
images for prediction. The obtained results demonstrated that for the most
important classes the mean accuracy values of semantic image segmentation for
coarse GT annotations are higher than for the fine GT ones, and the standard
deviation values are vice versa. It means that for some applications some
unimportant classes can be excluded and the model can be tuned further for some
classes and specific regions on the coarse GT dataset without loss of the
accuracy even. Moreover, this opens the perspectives to use deep neural
networks for the preparation of such coarse GT datasets.Comment: 10 pages, 6 figures, 2 tables, The Second International Conference on
Computer Science, Engineering and Education Applications (ICCSEEA2019) 26-27
January 2019, Kiev, Ukrain
Lifting GIS Maps into Strong Geometric Context for Scene Understanding
Contextual information can have a substantial impact on the performance of
visual tasks such as semantic segmentation, object detection, and geometric
estimation. Data stored in Geographic Information Systems (GIS) offers a rich
source of contextual information that has been largely untapped by computer
vision. We propose to leverage such information for scene understanding by
combining GIS resources with large sets of unorganized photographs using
Structure from Motion (SfM) techniques. We present a pipeline to quickly
generate strong 3D geometric priors from 2D GIS data using SfM models aligned
with minimal user input. Given an image resectioned against this model, we
generate robust predictions of depth, surface normals, and semantic labels. We
show that the precision of the predicted geometry is substantially more
accurate other single-image depth estimation methods. We then demonstrate the
utility of these contextual constraints for re-scoring pedestrian detections,
and use these GIS contextual features alongside object detection score maps to
improve a CRF-based semantic segmentation framework, boosting accuracy over
baseline models
Robust Motion Segmentation from Pairwise Matches
In this paper we address a classification problem that has not been
considered before, namely motion segmentation given pairwise matches only. Our
contribution to this unexplored task is a novel formulation of motion
segmentation as a two-step process. First, motion segmentation is performed on
image pairs independently. Secondly, we combine independent pairwise
segmentation results in a robust way into the final globally consistent
segmentation. Our approach is inspired by the success of averaging methods. We
demonstrate in simulated as well as in real experiments that our method is very
effective in reducing the errors in the pairwise motion segmentation and can
cope with large number of mismatches
- …