106,235 research outputs found
Text to 3D Scene Generation with Rich Lexical Grounding
The ability to map descriptions of scenes to 3D geometric representations has
many applications in areas such as art, education, and robotics. However, prior
work on the text to 3D scene generation task has used manually specified object
categories and language that identifies them. We introduce a dataset of 3D
scenes annotated with natural language descriptions and learn from this data
how to ground textual descriptions to physical objects. Our method successfully
grounds a variety of lexical terms to concrete referents, and we show
quantitatively that our method improves 3D scene generation over previous work
using purely rule-based methods. We evaluate the fidelity and plausibility of
3D scenes generated with our grounding approach through human judgments. To
ease evaluation on this task, we also introduce an automated metric that
strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201
Unified Pragmatic Models for Generating and Following Instructions
We show that explicit pragmatic inference aids in correctly generating and
following natural language instructions for complex, sequential tasks. Our
pragmatics-enabled models reason about why speakers produce certain
instructions, and about how listeners will react upon hearing them. Like
previous pragmatic models, we use learned base listener and speaker models to
build a pragmatic speaker that uses the base listener to simulate the
interpretation of candidate descriptions, and a pragmatic listener that reasons
counterfactually about alternative descriptions. We extend these models to
tasks with sequential structure. Evaluation of language generation and
interpretation shows that pragmatic inference improves state-of-the-art
listener models (at correctly interpreting human instructions) and speaker
models (at producing instructions correctly interpreted by humans) in diverse
settings.Comment: NAACL 2018, camera-ready versio
Show and Tell: A Neural Image Caption Generator
Automatically describing the content of an image is a fundamental problem in
artificial intelligence that connects computer vision and natural language
processing. In this paper, we present a generative model based on a deep
recurrent architecture that combines recent advances in computer vision and
machine translation and that can be used to generate natural sentences
describing an image. The model is trained to maximize the likelihood of the
target description sentence given the training image. Experiments on several
datasets show the accuracy of the model and the fluency of the language it
learns solely from image descriptions. Our model is often quite accurate, which
we verify both qualitatively and quantitatively. For instance, while the
current state-of-the-art BLEU-1 score (the higher the better) on the Pascal
dataset is 25, our approach yields 59, to be compared to human performance
around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66,
and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we
achieve a BLEU-4 of 27.7, which is the current state-of-the-art
Machine Learning and Irresponsible Inference: Morally Assessing the Training Data for Image Recognition Systems
Just as humans can draw conclusions responsibly or irresponsibly, so too can computers. Machine learning systems that have been trained on data sets that include irresponsible judgments are likely to yield irresponsible predictions as outputs. In this paper I focus on a particular kind of inference a computer system might make: identification of the intentions with which a person acted on the basis of photographic evidence. Such inferences are liable to be morally objectionable, because of a way in which they are presumptuous. After elaborating this moral concern, I explore the possibility that carefully procuring the training data for image recognition systems could ensure that the systems avoid the problem. The lesson of this paper extends beyond just the particular case of image recognition systems and the challenge of responsibly identifying a person’s intentions. Reflection on this particular case demonstrates the importance (as well as the difficulty) of evaluating machine learning systems and their training data from the standpoint of moral considerations that are not encompassed by ordinary assessments of predictive accuracy
- …