2,463 research outputs found
CSGNet: Neural Shape Parser for Constructive Solid Geometry
We present a neural architecture that takes as input a 2D or 3D shape and
outputs a program that generates the shape. The instructions in our program are
based on constructive solid geometry principles, i.e., a set of boolean
operations on shape primitives defined recursively. Bottom-up techniques for
this shape parsing task rely on primitive detection and are inherently slow
since the search space over possible primitive combinations is large. In
contrast, our model uses a recurrent neural network that parses the input shape
in a top-down manner, which is significantly faster and yields a compact and
easy-to-interpret sequence of modeling instructions. Our model is also more
effective as a shape detector compared to existing state-of-the-art detection
techniques. We finally demonstrate that our network can be trained on novel
datasets without ground-truth program annotations through policy gradient
techniques.Comment: Accepted at CVPR-201
Pick and Place Without Geometric Object Models
We propose a novel formulation of robotic pick and place as a deep
reinforcement learning (RL) problem. Whereas most deep RL approaches to robotic
manipulation frame the problem in terms of low level states and actions, we
propose a more abstract formulation. In this formulation, actions are target
reach poses for the hand and states are a history of such reaches. We show this
approach can solve a challenging class of pick-place and regrasping problems
where the exact geometry of the objects to be handled is unknown. The only
information our method requires is: 1) the sensor perception available to the
robot at test time; 2) prior knowledge of the general class of objects for
which the system was trained. We evaluate our method using objects belonging to
two different categories, mugs and bottles, both in simulation and on real
hardware. Results show a major improvement relative to a shape primitives
baseline
High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference
We propose a data-driven method for recovering miss-ing parts of 3D shapes.
Our method is based on a new deep learning architecture consisting of two
sub-networks: a global structure inference network and a local geometry
refinement network. The global structure inference network incorporates a long
short-term memorized context fusion module (LSTM-CF) that infers the global
structure of the shape based on multi-view depth information provided as part
of the input. It also includes a 3D fully convolutional (3DFCN) module that
further enriches the global structure representation according to volumetric
information in the input. Under the guidance of the global structure network,
the local geometry refinement network takes as input lo-cal 3D patches around
missing regions, and progressively produces a high-resolution, complete surface
through a volumetric encoder-decoder architecture. Our method jointly trains
the global structure inference and local geometry refinement networks in an
end-to-end manner. We perform qualitative and quantitative evaluations on six
object categories, demonstrating that our method outperforms existing
state-of-the-art work on shape completion.Comment: 8 pages paper, 11 pages supplementary material, ICCV spotlight pape
GVP: Generative Volumetric Primitives
Advances in 3D-aware generative models have pushed the boundary of image
synthesis with explicit camera control. To achieve high-resolution image
synthesis, several attempts have been made to design efficient generators, such
as hybrid architectures with both 3D and 2D components. However, such a design
compromises multiview consistency, and the design of a pure 3D generator with
high resolution is still an open problem. In this work, we present Generative
Volumetric Primitives (GVP), the first pure 3D generative model that can sample
and render 512-resolution images in real-time. GVP jointly models a number of
volumetric primitives and their spatial information, both of which can be
efficiently generated via a 2D convolutional network. The mixture of these
primitives naturally captures the sparsity and correspondence in the 3D volume.
The training of such a generator with a high degree of freedom is made possible
through a knowledge distillation technique. Experiments on several datasets
demonstrate superior efficiency and 3D consistency of GVP over the
state-of-the-art.Comment: https://vcai.mpi-inf.mpg.de/projects/GVP/index.htm
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …