32,707 research outputs found
Multimodal Grounding for Language Processing
This survey discusses how recent developments in multimodal processing
facilitate conceptual grounding of language. We categorize the information flow
in multimodal processing with respect to cognitive models of human information
processing and analyze different methods for combining multimodal
representations. Based on this methodological inventory, we discuss the benefit
of multimodal grounding for a variety of language processing tasks and the
challenges that arise. We particularly focus on multimodal grounding of verbs
which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference
of Computational Linguistics. Please refer to this version for citations:
https://www.aclweb.org/anthology/papers/C/C18/C18-1197
Efficient contour-based shape representation and matching
This paper presents an efficient method for calculating the
similarity between 2D closed shape contours. The proposed
algorithm is invariant to translation, scale change and rotation. It can be used for database retrieval or for detecting regions with a particular shape in video sequences. The proposed algorithm is suitable for real-time applications. In the first stage of the algorithm, an ordered sequence of contour points approximating the shapes is extracted from the input binary images. The contours are translation and scale-size normalized, and small sets of the most likely starting points for both shapes are extracted. In the second stage, the starting points from both shapes are assigned into pairs and rotation alignment is performed. The dissimilarity measure is based on the geometrical distances between corresponding contour points. A fast sub-optimal method for solving the correspondence problem between contour points from two shapes is proposed. The dissimilarity measure is calculated for each pair of starting points. The lowest dissimilarity is taken as the final dissimilarity measure between two shapes. Three different experiments are carried out using the proposed
approach: letter recognition using a web camera, our
own simulation of Part B of the MPEG-7 core experiment
āCE-Shape1ā and detection of characters in cartoon video
sequences. Results indicate that the proposed dissimilarity
measure is aligned with human intuition
Interaction Issues in Computer Aided Semantic\ud Annotation of Multimedia
The CASAM project aims to provide a tool for more efficient and effective annotation of multimedia documents through collaboration between a user and a system performing an automated analysis of the media content. A critical part of the project is to develop a user interface which best supports both the user and the system through optimal human-computer interaction. In this paper we discuss the work undertaken, the proposed user interface and underlying interaction issues which drove its development
- ā¦