20 research outputs found
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
Understanding the message of images
We investigate the problem of understanding the message (gist) conveyed by images and
their captions as found, for instance, on websites or news articles. To this end, we propose a
methodology to capture the meaning of image-caption pairs on the basis of large amounts
of machine-readable knowledge that have previously been shown to be highly effective for
text understanding. Our method identifies the connotation of objects beyond their denotation:
where most approaches to image or image-text understanding focus on the denotation of
objects, i.e., their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view image
understanding as the task of representing an image-caption pair on the basis of a widecoverage
vocabulary of concepts such as the one provided by Wikipedia, and cast gist
detection as a concept-ranking problem with image-caption pairs as queries
Requirements elicitation towards a search engine for semantic multimedia content
We investigate user requirements regarding the interface design for a semantic multimedia search and retrieval based on a prototypical implementation of a search engine for multimedia content on the web. Thus, unlike existing image search engines and video search engines, we are interested in true multimedia content combining different media assets into multimedia documents like PowerPoint presentations and Flash files. In a user study with 20 participants, we conducted a formative evaluation based on the think-aloud method and semi-structured interviews in order to obtain requirements to a future web search engine for multimedia content. The interviews are complemented by a paper-and-pencil questionnaire to obtain quantitative information and present mockups demonstrating the user interface of a future multimedia search and retrieval engine
fulgeo-Design of an Intuitive User Interface for a Multimedia Search Engine
Multimedia documents like PowerPoint presentations or
Flash documents are widely adopted in the Internet and exist
in context of lots of different topics. However, so far there is
no user friendly way to explore and search for this content.
The aim of this work is to address this issue by developing
a new, easy-to-use user interface approach and prototype
search engine. Our system is called fulgeo and specifically
focuses on a suitable multimedia interface for visualizing the
query results of Flash documents. The prototype is available
online as live demo at: http://fulgeo.komsys.org
Fulgeo - towards an intuitive user interface for a semantics-enabled multimedia search engine
Multimedia documents like PowerPoint presentations or Flash documents are widely adopted in the Internet and exist in context of lots of different topics. However, so far there is no user friendly way to explore and search for this content. The aim of this work is to address this issue by developing a new, easy-to-use user interface approach and prototype search engine. Our system is called fulgeo and specifically focuses on a suitable multimedia interface for visualizing the query results of semantically-enriched Flash documents
Weakly supervised construction of a repository of iconic images
We present a first attempt at semi-automatically harvesting a dataset of iconic images, namely
images that depict objects or scenes, which arouse associations to abstract topics. Our method
starts with representative topic-evoking images from Wikipedia, which are labeled with relevant
concepts and entities found in their associated captions. These are used to query an online image
repository (i.e., Flickr), in order to further acquire additional examples of topic-specific iconic
relations. To this end, we leverage a combination of visual similarity measures, image clustering
and matching algorithms to acquire clusters of iconic images that are topically connected to the
original seed images, while also allowing for various degrees of diversity. Our first results are
promising in that they indicate the feasibility of the task and that we are able to build a first
version of our resource with minimal supervision
Data from the paper: Weakly supervised construction of a repository of iconic images
We present a first attempt at semi-automatically harvesting a dataset of iconic images. Iconic images are depicting objects or scenes, which arouse associations to abstract topics. Our method starts with representative topic-evoking images from Wikipedia, which are labeled with relevant concepts and entities found in their associated captions. These are used to query an online image repository (i.e., Flickr), in order to further acquire additional examples of topic-specific iconic relations. To this end, we leverage a combination of visual similarity measures, image clustering and matching algorithms to acquire clusters of iconic images that are topically connected to the original seed images, while also allowing for various degrees of diversity. Our first results are promising in that they indicate the feasibility of the task and that we are able to build a first version of our resource with minimal supervision
Image with a Message: Towards Detecting Non-Literal Image Usages by Visual Linking
A key task to understand an image and its
corresponding caption is not only to find
out what is shown on the picture and described
in the text, but also what is the
exact relationship between these two elements.
The long-term objective of our
work is to be able to distinguish different
types of relationship, including literal
vs. non-literal usages, as well as finegrained
non-literal usages (i.e., symbolic
vs. iconic). Here, we approach this challenging
problem by answering the question:
‘How can we quantify the degrees
of similarity between the literal meanings
expressed within images and their captions?’.
We formulate this problem as a
ranking task, where links between entities
and potential regions are created and
ranked for relevance. Using a Ranking
SVM allows us to leverage from the preference
ordering of the links, which help us
in the similarity calculation for the cases
of visual or textual ambiguity, as well as
misclassified data. Our experiments show
that aggregating different features using a
supervised ranker achieves better results
than a baseline knowledge-base method.
However, much work still lies ahead, and
we accordingly conclude the paper with a
detailed discussion of a short- and longterm
outlook on how to push our work on
relationship classification one step further
Image with a message : towards detecting non-literal image usages by visual linking
A key task to understand an image and its
corresponding caption is not only to find
out what is shown on the picture and described
in the text, but also what is the
exact relationship between these two elements.
The long-term objective of our
work is to be able to distinguish different
types of relationship, including literal
vs. non-literal usages, as well as finegrained
non-literal usages (i.e., symbolic
vs. iconic). Here, we approach this challenging
problem by answering the question:
‘How can we quantify the degrees
of similarity between the literal meanings
expressed within images and their captions?’.
We formulate this problem as a
ranking task, where links between entities
and potential regions are created and
ranked for relevance. Using a Ranking
SVM allows us to leverage from the preference
ordering of the links, which help us
in the similarity calculation for the cases
of visual or textual ambiguity, as well as
misclassified data. Our experiments show
that aggregating different features using a
supervised ranker achieves better results
than a baseline knowledge-base method.
However, much work still lies ahead, and
we accordingly conclude the paper with a
detailed discussion of a short- and longterm
outlook on how to push our work on
relationship classification one step further