46,887 research outputs found
Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
The impressive performance of deep convolutional neural networks in
single-view 3D reconstruction suggests that these models perform non-trivial
reasoning about the 3D structure of the output space. However, recent work has
challenged this belief, showing that complex encoder-decoder architectures
perform similarly to nearest-neighbor baselines or simple linear decoder models
that exploit large amounts of per category data in standard benchmarks. On the
other hand settings where 3D shape must be inferred for new categories with few
examples are more natural and require models that generalize about shapes. In
this work we demonstrate experimentally that naive baselines do not apply when
the goal is to learn to reconstruct novel objects using very few examples, and
that in a \emph{few-shot} learning setting, the network must learn concepts
that can be applied to new categories, avoiding rote memorization. To address
deficiencies in existing approaches to this problem, we propose three
approaches that efficiently integrate a class prior into a 3D reconstruction
model, allowing to account for intra-class variability and imposing an implicit
compositional structure that the model should learn. Experiments on the popular
ShapeNet database demonstrate that our method significantly outperform existing
baselines on this task in the few-shot setting
Visual Question Answering: A Survey of Methods and Datasets
Visual Question Answering (VQA) is a challenging task that has received
increasing attention from both the computer vision and the natural language
processing communities. Given an image and a question in natural language, it
requires reasoning over visual elements of the image and general knowledge to
infer the correct answer. In the first part of this survey, we examine the
state of the art by comparing modern approaches to the problem. We classify
methods by their mechanism to connect the visual and textual modalities. In
particular, we examine the common approach of combining convolutional and
recurrent neural networks to map images and questions to a common feature
space. We also discuss memory-augmented and modular architectures that
interface with structured knowledge bases. In the second part of this survey,
we review the datasets available for training and evaluating VQA systems. The
various datatsets contain questions at different levels of complexity, which
require different capabilities and types of reasoning. We examine in depth the
question/answer pairs from the Visual Genome project, and evaluate the
relevance of the structured annotations of images with scene graphs for VQA.
Finally, we discuss promising future directions for the field, in particular
the connection to structured knowledge bases and the use of natural language
processing models.Comment: 25 page
- …