9,639 research outputs found
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
Applying digital content management to support localisation
The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM
The DIGMAP geo-temporal web gazetteer service
This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Peirce, meaning and the semantic web
The so-called âSemantic Webâ is phase II of Tim Berners-Leeâs original vision for the WWW, whereby resources would no longer be indexed merely âsyntacticallyâ, via opaque character-strings, but via their meanings. We argue that one roadblock to Semantic Web development has been researchersâ adherence to a Cartesian, âprivateâ account of meaning, which has been dominant for the last 400 years, and which understands the meanings of signs as what their producers intend them to mean. It thus strives to build âsilos of meaningâ which explicitly and antecedently determine what signs on the Web will mean in all possible situations. By contrast, the field is moving forward insofar as it embraces Peirceâs âpublicâ, evolutionary account of meaning, according to which the meaning of signs just is the way they are interpreted and used to produce further signs. Given the extreme interconnectivity of the Web, it is argued that silos of meaning are unnecessary as plentiful machine-understandable data about the meaning of Web resources exists already in the form of those resources themselves, for applications that are able to leverage it, and it is Peirceâs account of meaning which can best make sense of the recent explosion in âuser-defined contentâ on the Web, and its relevance to achieving Semantic Web goals
Supporting online health queries by modeling patterns of creation, modification and retrieval of medical knowledge
We evaluated properties of knowledge resources that can be used for building new semantically and behaviorally motivated resources of health guidance and clinical decision making by modeling patterns of creation, modification and retrieval of medical knowledge. We evaluated statistical properties of Wikipedia articles of general terminology and medical terminology based on 25 most common diagnosis names emerging in an electronic health records system. We also evaluated statistical properties of general terminology used in everyday life in respect to occurrence and importance to enable adaptive perspectives to medical knowledge. Our experiments exploit a conceptual co-occurrence network that we created based on a set of 93 medical texts about healthcare guidelines provided by The Finnish Medical Society Duodecim containing 57 679 unique conceptual links. We provide supplementing statistics of an extended range of Wikipedia articles and an n-gram analysis about the set of medical texts.Peer reviewe
- âŠ