5,599 research outputs found
Learned versus Hand-Designed Feature Representations for 3d Agglomeration
For image recognition and labeling tasks, recent results suggest that machine
learning methods that rely on manually specified feature representations may be
outperformed by methods that automatically derive feature representations based
on the data. Yet for problems that involve analysis of 3d objects, such as mesh
segmentation, shape retrieval, or neuron fragment agglomeration, there remains
a strong reliance on hand-designed feature descriptors. In this paper, we
evaluate a large set of hand-designed 3d feature descriptors alongside features
learned from the raw data using both end-to-end and unsupervised learning
techniques, in the context of agglomeration of 3d neuron fragments. By
combining unsupervised learning techniques with a novel dynamic pooling scheme,
we show how pure learning-based methods are for the first time competitive with
hand-designed 3d shape descriptors. We investigate data augmentation strategies
for dramatically increasing the size of the training set, and show how
combining both learned and hand-designed features leads to the highest
accuracy
3D Shape Segmentation with Projective Convolutional Networks
This paper introduces a deep architecture for segmenting 3D objects into
their labeled semantic parts. Our architecture combines image-based Fully
Convolutional Networks (FCNs) and surface-based Conditional Random Fields
(CRFs) to yield coherent segmentations of 3D shapes. The image-based FCNs are
used for efficient view-based reasoning about 3D object parts. Through a
special projection layer, FCN outputs are effectively aggregated across
multiple views and scales, then are projected onto the 3D object surfaces.
Finally, a surface-based CRF combines the projected outputs with geometric
consistency cues to yield coherent segmentations. The whole architecture
(multi-view FCNs and CRF) is trained end-to-end. Our approach significantly
outperforms the existing state-of-the-art methods in the currently largest
segmentation benchmark (ShapeNet). Finally, we demonstrate promising
segmentation results on noisy 3D shapes acquired from consumer-grade depth
cameras.Comment: This is an updated version of our CVPR 2017 paper. We incorporated
new experiments that demonstrate ShapePFCN performance under the case of
consistent *upright* orientation and an additional input channel in our
rendered images for encoding height from the ground plane (upright axis
coordinate values). Performance is improved in this settin
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes
Deep neural network (DNN) architectures have been shown to outperform
traditional pipelines for object segmentation and pose estimation using RGBD
data, but the performance of these DNN pipelines is directly tied to how
representative the training data is of the true data. Hence a key requirement
for employing these methods in practice is to have a large set of labeled data
for your specific robotic manipulation task, a requirement that is not
generally satisfied by existing datasets. In this paper we develop a pipeline
to rapidly generate high quality RGBD data with pixelwise labels and object
poses. We use an RGBD camera to collect video of a scene from multiple
viewpoints and leverage existing reconstruction techniques to produce a 3D
dense reconstruction. We label the 3D reconstruction using a human assisted
ICP-fitting of object meshes. By reprojecting the results of labeling the 3D
scene we can produce labels for each RGBD image of the scene. This pipeline
enabled us to collect over 1,000,000 labeled object instances in just a few
days. We use this dataset to answer questions related to how much training data
is required, and of what quality the data must be, to achieve high performance
from a DNN architecture
Playing for Data: Ground Truth from Computer Games
Recent progress in computer vision has been driven by high-capacity models
trained on large datasets. Unfortunately, creating large datasets with
pixel-level labels has been extremely costly due to the amount of human effort
required. In this paper, we present an approach to rapidly creating
pixel-accurate semantic label maps for images extracted from modern computer
games. Although the source code and the internal operation of commercial games
are inaccessible, we show that associations between image patches can be
reconstructed from the communication between the game and the graphics
hardware. This enables rapid propagation of semantic labels within and across
images synthesized by the game, with no access to the source code or the
content. We validate the presented approach by producing dense pixel-level
semantic annotations for 25 thousand images synthesized by a photorealistic
open-world computer game. Experiments on semantic segmentation datasets show
that using the acquired data to supplement real-world images significantly
increases accuracy and that the acquired data enables reducing the amount of
hand-labeled real-world data: models trained with game data and just 1/3 of the
CamVid training set outperform models trained on the complete CamVid training
set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV
2016
- …