2,638 research outputs found
Retrieving with good sense
Although always present in text, word sense ambiguity only recently became regarded as a problem to information
retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in
disambiguation research. This paper first outlines this research and surveys the resulting efforts in information
retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt
from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval
Automatic Query Image Disambiguation for Content-Based Image Retrieval
Query images presented to content-based image retrieval systems often have
various different interpretations, making it difficult to identify the search
objective pursued by the user. We propose a technique for overcoming this
ambiguity, while keeping the amount of required user interaction at a minimum.
To achieve this, the neighborhood of the query image is divided into coherent
clusters from which the user may choose the relevant ones. A novel feedback
integration technique is then employed to re-rank the entire database with
regard to both the user feedback and the original query. We evaluate our
approach on the publicly available MIRFLICKR-25K dataset, where it leads to a
relative improvement of average precision by 23% over the baseline retrieval,
which does not distinguish between different image senses.Comment: VISAPP 2018 paper, 8 pages, 5 figures. Source code:
https://github.com/cvjena/ai
On the Impact of Entity Linking in Microblog Real-Time Filtering
Microblogging is a model of content sharing in which the temporal locality of
posts with respect to important events, either of foreseeable or unforeseeable
nature, makes applica- tions of real-time filtering of great practical
interest. We propose the use of Entity Linking (EL) in order to improve the
retrieval effectiveness, by enriching the representation of microblog posts and
filtering queries. EL is the process of recognizing in an unstructured text the
mention of relevant entities described in a knowledge base. EL of short pieces
of text is a difficult task, but it is also a scenario in which the information
EL adds to the text can have a substantial impact on the retrieval process. We
implement a start-of-the-art filtering method, based on the best systems from
the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
extend it with a Wikipedia-based EL method. Results show that the use of EL
significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 -
17, 201
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
Word sense disambiguation and information retrieval
It has often been thought that word sense ambiguity is a cause of poor performance in Information Retrieval
(IR) systems. The belief is that if ambiguous words can be correctly disambiguated, IR performance will
increase. However, recent research into the application of a word sense disambiguator to an IR system failed
to show any performance increase. From these results it has become clear that more basic research is needed
to investigate the relationship between sense ambiguity, disambiguation, and IR.
Using a technique that introduces additional sense ambiguity into a collection, this paper presents research
that goes beyond previous work in this field to reveal the influence that ambiguity and disambiguation have
on a probabilistic IR system. We conclude that word sense ambiguity is only problematic to an IR system
when it is retrieving from very short queries. In addition we argue that if a word sense disambiguator is to
be of any use to an IR system, the disambiguator must be able to resolve word senses to a high degree of
accuracy
Multi modal multi-semantic image retrieval
PhDThe rapid growth in the volume of visual information, e.g. image, and video can
overwhelm users’ ability to find and access the specific visual information of interest
to them. In recent years, ontology knowledge-based (KB) image information retrieval
techniques have been adopted into in order to attempt to extract knowledge from these
images, enhancing the retrieval performance. A KB framework is presented to
promote semi-automatic annotation and semantic image retrieval using multimodal
cues (visual features and text captions). In addition, a hierarchical structure for the KB
allows metadata to be shared that supports multi-semantics (polysemy) for concepts.
The framework builds up an effective knowledge base pertaining to a domain specific
image collection, e.g. sports, and is able to disambiguate and assign high level
semantics to ‘unannotated’ images.
Local feature analysis of visual content, namely using Scale Invariant Feature
Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’
model (BVW) as an effective method to represent visual content information and to
enhance its classification and retrieval. Local features are more useful than global
features, e.g. colour, shape or texture, as they are invariant to image scale, orientation
and camera angle. An innovative approach is proposed for the representation,
annotation and retrieval of visual content using a hybrid technique based upon the use
of an unstructured visual word and upon a (structured) hierarchical ontology KB
model. The structural model facilitates the disambiguation of unstructured visual
words and a more effective classification of visual content, compared to a vector
space model, through exploiting local conceptual structures and their relationships.
The key contributions of this framework in using local features for image
representation include: first, a method to generate visual words using the semantic
local adaptive clustering (SLAC) algorithm which takes term weight and spatial
locations of keypoints into account. Consequently, the semantic information is
preserved. Second a technique is used to detect the domain specific ‘non-informative
visual words’ which are ineffective at representing the content of visual data and
degrade its categorisation ability. Third, a method to combine an ontology model with
xi
a visual word model to resolve synonym (visual heterogeneity) and polysemy
problems, is proposed. The experimental results show that this approach can discover
semantically meaningful visual content descriptions and recognise specific events,
e.g., sports events, depicted in images efficiently.
Since discovering the semantics of an image is an extremely challenging problem, one
promising approach to enhance visual content interpretation is to use any associated
textual information that accompanies an image, as a cue to predict the meaning of an
image, by transforming this textual information into a structured annotation for an
image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct
types of information representation and modality, there are some strong, invariant,
implicit, connections between images and any accompanying text information.
Semantic analysis of image captions can be used by image retrieval systems to
retrieve selected images more precisely. To do this, a Natural Language Processing
(NLP) is exploited firstly in order to extract concepts from image captions. Next, an
ontology-based knowledge model is deployed in order to resolve natural language
ambiguities. To deal with the accompanying text information, two methods to extract
knowledge from textual information have been proposed. First, metadata can be
extracted automatically from text captions and restructured with respect to a semantic
model. Second, the use of LSI in relation to a domain-specific ontology-based
knowledge model enables the combined framework to tolerate ambiguities and
variations (incompleteness) of metadata. The use of the ontology-based knowledge
model allows the system to find indirectly relevant concepts in image captions and
thus leverage these to represent the semantics of images at a higher level.
Experimental results show that the proposed framework significantly enhances image
retrieval and leads to narrowing of the semantic gap between lower level machinederived
and higher level human-understandable conceptualisation
A New Combination Method Based on Adaptive Genetic Algorithm for Medical Image Retrieval
Medical image retrieval could be based on the text describing the image as the caption or the title. The use of text terms to retrieve images have several disadvantages such as term-disambiguation. Recent studies prove that representing text into semantic units (concepts) can improve the semantic representation of textual information. However, the use of conceptual representation has other problems as the miss or erroneous semantic relation between two concepts. Other studies show that combining textual and conceptual text representations leads to better accuracy. Popularly, a score for textual representation and a score for conceptual representation are computed and then a combination function is used to have one score. Although the existing of many combination methods of two scores, we propose in this paper a new combination method based on adaptive version of the genetic algorithm. Experiments are carried out on Medical Information Retrieval Task of the ImageCLEF 2009 and 2010. The results confirm that the combination of both textual and conceptual scores allows best accuracy. In addition, our approach outperforms the other combination methods
Multi Domain Semantic Information Retrieval Based on Topic Model
Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important information from numerous database sources have been a key challenge in current IR systems.
Topic modeling is one of the most recent techniquesthat discover hidden thematic structures from large data collections without human supervision. Several topic models have been proposed in various fields of study and have been utilized extensively for many applications. Latent Dirichlet Allocation (LDA) is the most well-known topic model that generates topics from large corpus of resources, such as text, images, and audio.It has been widely used in many areas in information retrieval and data mining, providing efficient way of identifying latent topics among document collections. However, LDA has a drawback that topic cohesion within a concept is attenuated when estimating infrequently occurring words. Moreover, LDAseems not to consider the meaning of words, but rather to infer hidden topics based on a statisticalapproach. However, LDA can cause either reduction in the quality of topic words or increase in loose relations between topics.
In order to solve the previous problems, we propose a domain specific topic model that combines domain concepts with LDA. Two domain specific algorithms are suggested for solving the difficulties associated with LDA. The main strength of our proposed model comes from the fact that it narrows semantic concepts from broad domain knowledge to a specific one which solves the unknown domain problem. Our proposed model is extensively tested on various applications, query expansion, classification, and summarization, to demonstrate the effectiveness of the model. Experimental results show that the proposed model significantly increasesthe performance of applications
- …