Search CORE

34,958 research outputs found

Multimodal Grounding for Language Processing

Author: Beinborn Lisa
Botschen Teresa
Gurevych Iryna
Publication venue
Publication date: 01/01/2018
Field of study

This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

arXiv.org e-Print Archive

TUbiblio

VU Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

From Linguistic Linked Open Data to Multimodal Natural Interaction: A Case Study

Author: Cera Valeria
Cutugno Francesco
Di Maro Maria
Grazioso Marco
Origlia Antonio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

Sequence to Sequence -- Video to Text

Author: Darrell Trevor
Donahue Jeff
Mooney Raymond
Rohrbach Marcus
Saenko Kate
Venugopalan Subhashini
Publication venue
Publication date: 19/10/2015
Field of study

Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).Comment: ICCV 2015 camera-ready. Includes code, project page and LSMDC challenge result

arXiv.org e-Print Archive

Crossref

Cognitive visual tracking and camera control

Author: Arens
Ayers
Bellotto
Ben Benfold
Bimbo
Binford
Boult
Brémond
Chuan Zhao
Eric Sommerlade
Haag
Hall
Hanno Harland
Hans-Hellmut Nagel
Hartley
Ian Reid
Kojima
Nagel
Nicola Bellotto
Nicola Pirlo
Robertson
Tsomko
Viola
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision

University of Lincoln Institutional Repository

Crossref

Adelaide Research & Scholarship

The Semantic Grid: A future e-Science infrastructure

Author: de Roure D.
Jennings N. R.
Shadbolt N.
Publication venue: John Wiley and Sons Ltd
Publication date: 01/01/2003
Field of study

e-Science offers a promising vision of how computer and communication technology can support and enhance the scientific process. It does this by enabling scientists to generate, analyse, share and discuss their insights, experiments and results in an effective manner. The underlying computer infrastructure that provides these facilities is commonly referred to as the Grid. At this time, there are a number of grid applications being developed and there is a whole raft of computer technologies that provide fragments of the necessary functionality. However there is currently a major gap between these endeavours and the vision of e-Science in which there is a high degree of easy-to-use and seamless automation and in which there are flexible collaborations and computations on a global scale. To bridge this practice–aspiration divide, this paper presents a research agenda whose aim is to move from the current state of the art in e-Science infrastructure, to the future infrastructure that is needed to support the full richness of the e-Science vision. Here the future e-Science research infrastructure is termed the Semantic Grid (Semantic Grid to Grid is meant to connote a similar relationship to the one that exists between the Semantic Web and the Web). In particular, we present a conceptual architecture for the Semantic Grid. This architecture adopts a service-oriented perspective in which distinct stakeholders in the scientific process, represented as software agents, provide services to one another, under various service level agreements, in various forms of marketplace. We then focus predominantly on the issues concerned with the way that knowledge is acquired and used in such environments since we believe this is the key differentiator between current grid endeavours and those envisioned for the Semantic Grid

CiteSeerX

Southampton (e-Prints Soton)

Introduction: The Third International Conference on Epigenetic Robotics

Author: Berthouze Luc
Prince Christopher G.
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2003
Field of study

This paper summarizes the paper and poster contributions to the Third International Workshop on Epigenetic Robotics. The focus of this workshop is on the cross-disciplinary interaction of developmental psychology and robotics. Namely, the general goal in this area is to create robotic models of the psychological development of various behaviors. The term "epigenetic" is used in much the same sense as the term "developmental" and while we could call our topic "developmental robotics", developmental robotics can be seen as having a broader interdisciplinary emphasis. Our focus in this workshop is on the interaction of developmental psychology and robotics and we use the phrase "epigenetic robotics" to capture this focus

CogPrints Cognitive Sciences Eprint Archive

From images via symbols to contexts: using augmented reality for interactive model acquisition

Author: Bauckhage Christian
Hanheide Marc
Wachsmuth Sven
Wrede Sebastian
Publication venue
Publication date: 11/09/2005
Field of study

Systems that perform in real environments need to bind the internal state to externally perceived objects, events, or complete scenes. How to learn this correspondence has been a long standing problem in computer vision as well as artificial intelligence. Augmented Reality provides an interesting perspective on this problem because a human user can directly relate displayed system results to real environments. In the following we present a system that is able to bootstrap internal models from user-system interactions. Starting from pictorial representations it learns symbolic object labels that provide the basis for storing observed episodes. In a second step, more complex relational information is extracted from stored episodes that enables the system to react on specific scene contexts

University of Lincoln Institutional Repository

Using conceptual metaphor and functional grammar to explore how language used in physics affects student learning

Author: A. B. Arons
A. P. French
B. F. Bowdle
D. G. Schuster
D. Gentner
D. J. Griffiths
D. T. Brookes
David T. Brookes
E. Merzbacher
E. Schrödinger
Eugenia Etkina
G. Fauconnier
G. Lakoff
G. Lakoff
H. Goldstein
J. D. Slotta
J. L. Lemke
J. T. Morgan
M. A. K. Halliday
M. B. Hesse
M. Black
M. J. Reddy
M. Reiner
N. Nersessian
R. Boyd
R. Eisberg
R. H. Dicke
R. P. Feynman
R. Peierls
S. B. McKagan
Y. Shen
Publication venue: 'American Physical Society (APS)'
Publication date: 10/04/2007
Field of study

This paper introduces a theory about the role of language in learning physics. The theory is developed in the context of physics students' and physicists' talking and writing about the subject of quantum mechanics. We found that physicists' language encodes different varieties of analogical models through the use of grammar and conceptual metaphor. We hypothesize that students categorize concepts into ontological categories based on the grammatical structure of physicists' language. We also hypothesize that students over-extend and misapply conceptual metaphors in physicists' speech and writing. Using our theory, we will show how, in some cases, we can explain student difficulties in quantum mechanics as difficulties with language.Comment: Accepted for publication in Phys. Rev. ST:PE

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Multimodal Grounding for Language Processing

Author: Beinborn L.
Botschen T.
Gurevych I.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications