5,770 research outputs found
DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding
While deep neural networks have led to human-level performance on computer
vision tasks, they have yet to demonstrate similar gains for holistic scene
understanding. In particular, 3D context has been shown to be an extremely
important cue for scene understanding - yet very little research has been done
on integrating context information with deep models. This paper presents an
approach to embed 3D context into the topology of a neural network trained to
perform holistic scene understanding. Given a depth image depicting a 3D scene,
our network aligns the observed scene with a predefined 3D scene template, and
then reasons about the existence and location of each object within the scene
template. In doing so, our model recognizes multiple objects in a single
forward pass of a 3D convolutional neural network, capturing both global scene
and local object information simultaneously. To create training data for this
3D network, we generate partly hallucinated depth images which are rendered by
replacing real objects with a repository of CAD models of the same object
category. Extensive experiments demonstrate the effectiveness of our algorithm
compared to the state-of-the-arts. Source code and data are available at
http://deepcontext.cs.princeton.edu.Comment: Accepted by ICCV201
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
SEGCloud: Semantic Segmentation of 3D Point Clouds
3D semantic scene labeling is fundamental to agents operating in the real
world. In particular, labeling raw 3D point sets from sensors provides
fine-grained semantics. Recent works leverage the capabilities of Neural
Networks (NNs), but are limited to coarse voxel predictions and do not
explicitly enforce global consistency. We present SEGCloud, an end-to-end
framework to obtain 3D point-level segmentation that combines the advantages of
NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields
(FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are
transferred back to the raw 3D points via trilinear interpolation. Then the
FC-CRF enforces global consistency and provides fine-grained semantics on the
points. We implement the latter as a differentiable Recurrent NN to allow joint
optimization. We evaluate the framework on two indoor and two outdoor 3D
datasets (NYU V2, S3DIS, KITTI, Semantic3D.net), and show performance
comparable or superior to the state-of-the-art on all datasets.Comment: Accepted as a spotlight at the International Conference of 3D Vision
(3DV 2017
- …