2,467 research outputs found
Text to 3D Scene Generation with Rich Lexical Grounding
The ability to map descriptions of scenes to 3D geometric representations has
many applications in areas such as art, education, and robotics. However, prior
work on the text to 3D scene generation task has used manually specified object
categories and language that identifies them. We introduce a dataset of 3D
scenes annotated with natural language descriptions and learn from this data
how to ground textual descriptions to physical objects. Our method successfully
grounds a variety of lexical terms to concrete referents, and we show
quantitatively that our method improves 3D scene generation over previous work
using purely rule-based methods. We evaluate the fidelity and plausibility of
3D scenes generated with our grounding approach through human judgments. To
ease evaluation on this task, we also introduce an automated metric that
strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201
Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction
State-of-the-art methods for optical flow estimation rely on deep learning,
which require complex sequential training schemes to reach optimal performances
on real-world data. In this work, we introduce the COMBO deep network that
explicitly exploits the brightness constancy (BC) model used in traditional
methods. Since BC is an approximate physical model violated in several
situations, we propose to train a physically-constrained network complemented
with a data-driven network. We introduce a unique and meaningful flow
decomposition between the physical prior and the data-driven complement,
including an uncertainty quantification of the BC model. We derive a joint
training scheme for learning the different components of the decomposition
ensuring an optimal cooperation, in a supervised but also in a semi-supervised
context. Experiments show that COMBO can improve performances over
state-of-the-art supervised networks, e.g. RAFT, reaching state-of-the-art
results on several benchmarks. We highlight how COMBO can leverage the BC model
and adapt to its limitations. Finally, we show that our semi-supervised method
can significantly simplify the training procedure
- …