Search CORE

197,177 research outputs found

Text to 3D Scene Generation with Rich Lexical Grounding

Author: Chang Angel
Manning Christopher D.
Monroe Will
Potts Christopher
Savva Manolis
Publication venue
Publication date: 01/01/2015
Field of study

The ability to map descriptions of scenes to 3D geometric representations has many applications in areas such as art, education, and robotics. However, prior work on the text to 3D scene generation task has used manually specified object categories and language that identifies them. We introduce a dataset of 3D scenes annotated with natural language descriptions and learn from this data how to ground textual descriptions to physical objects. Our method successfully grounds a variety of lexical terms to concrete referents, and we show quantitatively that our method improves 3D scene generation over previous work using purely rule-based methods. We evaluate the fidelity and plausibility of 3D scenes generated with our grounding approach through human judgments. To ease evaluation on this task, we also introduce an automated metric that strongly correlates with human judgments.Comment: 10 pages, 7 figures, 3 tables. To appear in ACL-IJCNLP 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Ontology based Scene Creation for the Development of Automated Vehicles

Author: Bagschik Gerrit
Maurer Markus
Menzel Till
Publication venue
Publication date: 23/04/2018
Field of study

The introduction of automated vehicles without permanent human supervision demands a functional system description, including functional system boundaries and a comprehensive safety analysis. These inputs to the technical development can be identified and analyzed by a scenario-based approach. Furthermore, to establish an economical test and release process, a large number of scenarios must be identified to obtain meaningful test results. Experts are doing well to identify scenarios that are difficult to handle or unlikely to happen. However, experts are unlikely to identify all scenarios possible based on the knowledge they have on hand. Expert knowledge modeled for computer aided processing may help for the purpose of providing a wide range of scenarios. This contribution reviews ontologies as knowledge-based systems in the field of automated vehicles, and proposes a generation of traffic scenes in natural language as a basis for a scenario creation.Comment: Accepted at the 2018 IEEE Intelligent Vehicles Symposium, 8 pages, 10 figure

arXiv.org e-Print Archive

Crossref

A framework for dialogue detection in movies

Author: Kotropoulos C
Kotti M
Moschou V
Pitas I
Ziòlko B
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

In this paper, we investigate a novel framework for dialogue detection that is based on indicator functions. An indicator function defines that a particular actor is present at each time instant. Two dialogue detection rules are developed and assessed. The first rule relies on the value of the cross-correlation function at zero time lag that is compared to a threshold. The second rule is based on the cross-power in a particular frequency band that is also compared to a threshold. Experiments are carried out in order to validate the feasibility of the aforementioned dialogue detection rules by using ground-truth indicator functions determined by human observers from six different movies. A total of 25 dialogue scenes and another 8 non-dialogue scenes are employed. The probabilities of false alarm and detection are estimated by cross-validation, where 70% of the available scenes are used to learn the thresholds employed in the dialogue detection rules and the remaining 30% of the scenes are used for testing. An almost perfect dialogue detection is reported for every distinct threshold. © Springer-Verlag Berlin Heidelberg 2006

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Playing for Data: Ground Truth from Computer Games

Author: A Geiger
B McCane
B Pepik
BKP Horn
D Vázquez
DJ Butler
GJ Brostow
J Shotton
J Shotton
J Tighe
J Xu
JL Barron
S Baker
T Akenine-Möller
Takafumi Saito
V Haltakov
Publication venue
Publication date: 07/08/2016
Field of study

Recent progress in computer vision has been driven by high-capacity models trained on large datasets. Unfortunately, creating large datasets with pixel-level labels has been extremely costly due to the amount of human effort required. In this paper, we present an approach to rapidly creating pixel-accurate semantic label maps for images extracted from modern computer games. Although the source code and the internal operation of commercial games are inaccessible, we show that associations between image patches can be reconstructed from the communication between the game and the graphics hardware. This enables rapid propagation of semantic labels within and across images synthesized by the game, with no access to the source code or the content. We validate the presented approach by producing dense pixel-level semantic annotations for 25 thousand images synthesized by a photorealistic open-world computer game. Experiments on semantic segmentation datasets show that using the acquired data to supplement real-world images significantly increases accuracy and that the acquired data enables reducing the amount of hand-labeled real-world data: models trained with game data and just 1/3 of the CamVid training set outperform models trained on the complete CamVid training set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV 2016

arXiv.org e-Print Archive

Crossref

ImageSpirit: Verbal Guided Image Parsing

Author: Cheng Ming-Ming
Crook Nigel
Lin Wen-Yan
Mitra Niloy
Sturgess Paul
Torr Philip
Vineet Vibhav
Warrell Jonathan
Zheng Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

arXiv.org e-Print Archive

CiteSeerX

Institutional Knowledge at Singapore Management University

UCL Discovery

Oxford Brookes University: RADAR