293 research outputs found
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
Playing for Data: Ground Truth from Computer Games
Recent progress in computer vision has been driven by high-capacity models
trained on large datasets. Unfortunately, creating large datasets with
pixel-level labels has been extremely costly due to the amount of human effort
required. In this paper, we present an approach to rapidly creating
pixel-accurate semantic label maps for images extracted from modern computer
games. Although the source code and the internal operation of commercial games
are inaccessible, we show that associations between image patches can be
reconstructed from the communication between the game and the graphics
hardware. This enables rapid propagation of semantic labels within and across
images synthesized by the game, with no access to the source code or the
content. We validate the presented approach by producing dense pixel-level
semantic annotations for 25 thousand images synthesized by a photorealistic
open-world computer game. Experiments on semantic segmentation datasets show
that using the acquired data to supplement real-world images significantly
increases accuracy and that the acquired data enables reducing the amount of
hand-labeled real-world data: models trained with game data and just 1/3 of the
CamVid training set outperform models trained on the complete CamVid training
set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV
2016
Scan4Façade: Automated As-Is Façade Modeling of Historic High-Rise Buildings Using Drones and AI
This paper presents an automated as-is façade modeling method for existing and historic high-rise buildings, named Scan4Façade. To begin with, a camera drone with a spiral path is employed to capture building exterior images, and photogrammetry is used to conduct three-dimensional (3D) reconstruction and create mesh models for the scanned building façades. High-resolution façade orthoimages are then generated from mesh models and pixelwise segmented by an artificial intelligence (AI) model named U-net. A combined data augmentation strategy, including random flipping, rotation, resizing, perspective transformation, and color adjustment, is proposed for model training with a limited number of labels. As a result, the U-net achieves an average pixel accuracy of 0.9696 and a mean intersection over union of 0.9063 in testing. Then, the developed twoStagesClustering algorithm, with a two-round shape clustering and a two-round coordinates clustering, is used to precisely extract façade elements’ dimensions and coordinates from façade orthoimages and pixelwise label. In testing with the Michigan Central Station (office tower), a historic high-rise building, the developed algorithm achieves an accuracy of 99.77% in window extraction. In addition, the extracted façade geometric information and element types are transformed into AutoCAD command and script files to create CAD drawings without manual interaction. Experimental results also show that the proposed Scan4Façade method can provide clear and accurate information to assist BIM feature creation in Revit. Future research recommendations are also stated in this paper
Rich probabilistic models for semantic labeling
Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung
A Cross-Season Correspondence Dataset for Robust Semantic Segmentation
In this paper, we present a method to utilize 2D-2D point matches between
images taken during different image conditions to train a convolutional neural
network for semantic segmentation. Enforcing label consistency across the
matches makes the final segmentation algorithm robust to seasonal changes. We
describe how these 2D-2D matches can be generated with little human interaction
by geometrically matching points from 3D models built from images. Two
cross-season correspondence datasets are created providing 2D-2D matches across
seasonal changes as well as from day to night. The datasets are made publicly
available to facilitate further research. We show that adding the
correspondences as extra supervision during training improves the segmentation
performance of the convolutional neural network, making it more robust to
seasonal changes and weather conditions.Comment: In Proc. CVPR 201
- …