2,029 research outputs found
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
Towards Understanding User Preferences from User Tagging Behavior for Personalization
Personalizing image tags is a relatively new and growing area of research,
and in order to advance this research community, we must review and challenge
the de-facto standard of defining tag importance. We believe that for greater
progress to be made, we must go beyond tags that merely describe objects that
are visually represented in the image, towards more user-centric and subjective
notions such as emotion, sentiment, and preferences.
We focus on the notion of user preferences and show that the order that users
list tags on images is correlated to the order of preference over the tags that
they provided for the image. While this observation is not completely
surprising, to our knowledge, we are the first to explore this aspect of user
tagging behavior systematically and report empirical results to support this
observation. We argue that this observation can be exploited to help advance
the image tagging (and related) communities.
Our contributions include: 1.) conducting a user study demonstrating this
observation, 2.) collecting a dataset with user tag preferences explicitly
collected.Comment: 6 page
Developing a Formal Model for Mind Maps
Mind map is a graphical technique, which is used to represent words, concepts, tasks or other connected items or arranged around central topic or idea. Mind maps are widely used, therefore exist plenty of software programs to create or edit them, while there is none format for the model representation, neither a standard format. This paper presents and effort to propose a formal mind map model aiming to describe the structure, content, semantics and social connections. The structure describes the basic mind map graph consisted of a node set, an edge set, a cloud set and a graphical connections set. The content includes the set of the texts and objects linked to the nodes. The social connections are the mind maps of other users, which form the neighborhood of the mind map owner in a social networking system. Finally, the mind map semantics is any true logic connection between mind map textual parts and a concept. Each of these elements of the model is formally described building the suggested mind map model. Its establishment will support the application of algorithms and methods towards their information extraction
Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture
One of the biggest challenges in Multimedia information retrieval and
understanding is to bridge the semantic gap by properly modeling concept
semantics in context. The presence of out of vocabulary (OOV) concepts
exacerbates this difficulty. To address the semantic gap issues, we formulate a
problem on learning contextualized semantics from descriptive terms and propose
a novel Siamese architecture to model the contextualized semantics from
descriptive terms. By means of pattern aggregation and probabilistic topic
models, our Siamese architecture captures contextualized semantics from the
co-occurring descriptive terms via unsupervised learning, which leads to a
concept embedding space of the terms in context. Furthermore, the co-occurring
OOV concepts can be easily represented in the learnt concept embedding space.
The main properties of the concept embedding space are demonstrated via
visualization. Using various settings in semantic priming, we have carried out
a thorough evaluation by comparing our approach to a number of state-of-the-art
methods on six annotation corpora in different domains, i.e., MagTag5K, CAL500
and Million Song Dataset in the music domain as well as Corel5K, LabelMe and
SUNDatabase in the image domain. Experimental results on semantic priming
suggest that our approach outperforms those state-of-the-art methods
considerably in various aspects
- …