140,565 research outputs found
An Information Retrieval Model Based On Word Concept
PACLIC 20 / Wuhan, China / 1-3 November, 200
Enhancing Information Retrieval Through Concept-Based Language Modeling and Semantic Smoothing.
Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two wellâknown problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasalâconcepts, or word relationships, but such models are estimated using simple nâgrams or concept counting. In this paper, we address polysemy and synonymy mismatch with a conceptâbased language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the conceptâbased model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single wordâbased model and the Markov Random Field model (using a Markov classifier)
Different Modes of Semantic Representation in Image Retrieval
Semantic representations of words can be acquired by both textual and perceptual information. Multimodal models integrate this information, and outperform text-only semantic representations of words. In many contexts, they better reflect human concept acquisition. A common model for semantic representation is the semantic vector, a list of decimal numbers representing the clusters in which a word appears in text. Studies have shown that if two words have similar vectors, they are likely to have similar meaning, or at least be relevant to each other. Other approaches entail inserting sentences, made up of caption words from an image set, into text, to modify the vectors corresponding to each word in a textual corpus\u27s vocabulary, and thus form different semantic representations. These techniques have also suggested that whereas concrete terms\u27 meanings tend to improve with propagation, abstract terms tend to become less accurate when too much information from their more concrete counterparts is propagated to them. In this study, I have therefore utilized different techniques for comparing words\u27 meanings, to implement an image retrieval system. Even if a word w does not directly tag an image, the system retrieves images whose captions contain words that have the most similar vector representations to that of w. Therefore, we examine the extent to which a word\u27s semantic representation has improved, based on improvements in corresponding retrieval results from this system.https://digitalworks.union.edu/steinmetz_posters/1004/thumbnail.jp
A Review of Semantic Search Methods to Retrieve Information from the Qurâan Corpus
The Holy Qurâan is the most important resource for the Islamic sciences and the Arabic language (Iqbal et al., 2013). Muslims believe that the Qurâan is a revelation from Allah that was given 1,356 years ago. The Qurâan contains about 80,000 words divided into 114 chapters (Atwell et al., 2011). A chapter consists of a varying number of verses. This holy book contains information on diverse topics, such as life and the history of humanity and scientific knowledge (Alrehaili and Atwell, 2014). Corpus linguistics methods can be applied to study the lexical patterns in the Qurâan; for example, the Qurâan is one of the corpora available on the SketchEngine website. Qurâan researchers may want to go beyond word patterns to search for specific concepts and information. As a result, many Qurâanic search applications have been built to facilitate the retrieval of information from the Qurâan. Examples of these web applications are Qurany (Abbas, 2009), Qurâan Explorer (Explorer, 2005), Tanzil (Zarrabi-Zadeh, 2007), Qurâanic Arabic corpus (Dukes, 2013), and Quran.com. The techniques used to retrieve information from the Qurâan can be classified into two types: semantic-based and keyword-based. Semantic-based search techniques are concept-based which retrieves results by matching the contextual meaning of terms as they appear in a userâs query, whereas the keyword-based search technique returns results according to the letters in the word(s) of a query (Sudeepthi et al., 2012). The majority of Qurâanic search tools employ the keyword search technique. The existing Qurâanic semantic search techniques include the ontology-based technique (concepts) (Yauri et al., 2013), the synonyms-set technique (Shoaib et al., 2009), and the cross language information retrieval (CLIR) technique (Yunus et al., 2010). The ontology-based technique searches for the concept(s) matching a userâs query and then returns the verses related to these concept(s). The synonyms-set method produces all synonyms of the query word using WordNet and then returns all Qurâanic verses that contain words matching any synonyms of the query word. Cross language information retrieval (CLIR) translates the words of an input query into another language and then retrieves verses that contain words matching the translated words. On the other hand, keyword-based techniques include keyword matching, the morphologically-based technique (Al Gharaibeh et al., 2011), and use of a Chabot (Abu Shawar and Atwell, 2004). The keyword matching method returns verses that contain any of the query words. The morphologically-based technique uses stems of query words to search in the Qurâan corpus. In other words, this technique generates all other forms of the query words and then finds all Qurâanic verses matching those word forms. The Chabot selects the most important words such as nouns or verbs from a user query and then returns the Qurâanic verses that contain any words matching the selected words. There are several deficiencies with the Qurâanic verses (Ayaâat) retrieved for a query using the existing keyword search technique. These problems include the following: some irrelevant verses are retrieved, some relevant verses are not retrieved, or the sequence of retrieved verses is not in the right order (Shoaib et al., 2009). Misunderstanding the exact meaning of input words forming a query and neglecting some theories of information retrieval contribute significantly to limitations in the keyword-based technique (Raza et al.). Additionally, Qurâanic keyword search tools use limited Islamic resources related to the Qurâan. This affects the accuracy of the retrieved results. Moreover, current Qurâanic semantic search techniques have limitations in retrieved results. The main causes of these limitations include the following: semantic search tools use one source of Qurâanic ontology that does not cover all concepts in the Holy Qurâan, and Qurâanic ontologies are not aligned to each other, leading to inaccurate and uncomprehensive resources for Qurâanic ontology. To overcome the limitations in both semantic and keyword search techniques, we designed a framework for a new semantic search tool called the Qurâanic Semantic Search Tool (QSST). This search tool aims to employ both text-based and semantic search techniques. QSST aligns the existing Quranic ontologies to reduce the ambiguity in the search results. QSST can be divided into four components: a natural language analyser (NLA), a semantic search model (SSM), a keywords search model (KSM), and a scoring and ranking model (SRM). NLA tokenizes a userâs query and then applies different natural language processing techniques to the tokenized query. These techniques are the following: spelling correction, stop word removal, stemming, and part of speech tagging (POS). After that, the NLA uses WordNet to generate synonyms for the reformatted query words and sends these synonyms to the SSM and the KSM. The SSM searches in the Qurâanic Ontology database to find the related concepts of the normalised query and then returns results. At the same time, KSM retrieves results based on words matching the input words. SRM refines the results retrieved from both KSM and SSM by eliminating the redundant verses. Next, SRM ranks and scores the refined results. Finally, SRM presents the results to the user. References Abbas, N. H. 2009. Quran 'search for a concept' tool and website. MRes thesis, University of Leeds. Abu Shawar, B. and Atwell, E. 2004. An Arabic chatbot giving answers from the Qur'an. Proceedings of TALN. 4(2), pp.197-202. Al Gharaibeh, A. et al. 2011. The usage of formal methods in Quran search system. In: Proceedings of international conference on information and communication systems, Ibrid, Jordan. pp.22-24. Alrehaili, S. M. and Atwell, E. 2014. Computational ontologies for semantic tagging of the Quran: A survey of past approaches. In: LREC 2014 Proceedings. Atwell, E. et al. 2011. An artificial intelligence approach to Arabic and Islamic content on the internet. In: Proceedings of NITS 3rd National Information Technology Symposium. Dukes, K. 2013. Statistical parsing by machine learning from a classical Arabic treebank. PhD thesis. Explorer, Q. 2005. Quran Explorer [Online]. [Accessed 26 October 2014]. Available from: http://www.quranexplorer.com/Search/Default.aspx Iqbal, R. et al. 2013. An experience of developing Quran ontology with contextual information support. Multicultural Education & Technology Journal. 7, pp.333-343. Raza, S.A. et al. An essential framework for concept based evolutionary Quranic search engine (CEQSE). Shoaib, M. et al. 2009. Relational WordNet model for semantic search in Holy Quran. Emerging Technologies, 2009. ICET 2009. International Conference on, 2009. IEEE, 29-34. Sudeepthi, G. et al. 2012. A survey on semantic web search engine. International Journal of Computer Science, 9. Yauri, A. R. et al. 2013. Quranic verse extraction based on concepts using OWL-DL ontology. Research Journal of Applied Sciences Engineering and Technology. 6, pp.4492-4498. Yunus, M. et al. 2010. Semantic query for Quran documents results. Open Systems (ICOS), 2010 IEEE Conference on, 2010. IEEE, 1-5. Zarrabi-Zadeh, H. 2007. Tanzil [Online]. [Accessed 26 October 2014]. Available from: http://tanzil.net
Semantic Relevance Analysis of Subject-Predicate-Object (SPO) Triples
The goal of this thesis is to explore and integrate several existing measurements for ranking the relevance of a set of subject-predicate-object (SPO) triples to a given concept. As we are inundated with information from multiple sources on the World-Wide-Web, SPO similarity measures play a progressively important role in information extraction, information retrieval, document clustering and ontology learning. This thesis is applied in the Cyber Security Domain for identifying and understanding the factors and elements of sociopolitical events relevant to cyberattacks. Our efforts are towards developing an algorithm that begins with an analysis of news articles by taking into account the semantic information and word order information in the SPOs extracted from the articles. The semantic cohesiveness of a user provided concept and the extracted SPOs will then be calculated using semantic similarity measures derived from 1) structured lexical databases; and 2) our own corpus statistics. The use of a lexical database will enable our method to model human common sense knowledge, while the incorporation of our own corpus statistics allows our method to be adaptable to the Cyber Security domain. The model can be extended to other domains by simply changing the local corpus. The integration of different measures will help us triangulate the ranking of SPOs from multiple dimensions of semantic cohesiveness. Our results are compared to rankings gathered from surveys of human users, where each respondent ranks a list of SPO based on their common knowledge and understanding of the relevance evaluations to a given concept. The comparison demonstrates that our integrated SPO similarity ranking scheme closely reflects the human common sense knowledge in a specific domain it addresses
Multi modal multi-semantic image retrieval
PhDThe rapid growth in the volume of visual information, e.g. image, and video can
overwhelm usersâ ability to find and access the specific visual information of interest
to them. In recent years, ontology knowledge-based (KB) image information retrieval
techniques have been adopted into in order to attempt to extract knowledge from these
images, enhancing the retrieval performance. A KB framework is presented to
promote semi-automatic annotation and semantic image retrieval using multimodal
cues (visual features and text captions). In addition, a hierarchical structure for the KB
allows metadata to be shared that supports multi-semantics (polysemy) for concepts.
The framework builds up an effective knowledge base pertaining to a domain specific
image collection, e.g. sports, and is able to disambiguate and assign high level
semantics to âunannotatedâ images.
Local feature analysis of visual content, namely using Scale Invariant Feature
Transform (SIFT) descriptors, have been deployed in the âBag of Visual Wordsâ
model (BVW) as an effective method to represent visual content information and to
enhance its classification and retrieval. Local features are more useful than global
features, e.g. colour, shape or texture, as they are invariant to image scale, orientation
and camera angle. An innovative approach is proposed for the representation,
annotation and retrieval of visual content using a hybrid technique based upon the use
of an unstructured visual word and upon a (structured) hierarchical ontology KB
model. The structural model facilitates the disambiguation of unstructured visual
words and a more effective classification of visual content, compared to a vector
space model, through exploiting local conceptual structures and their relationships.
The key contributions of this framework in using local features for image
representation include: first, a method to generate visual words using the semantic
local adaptive clustering (SLAC) algorithm which takes term weight and spatial
locations of keypoints into account. Consequently, the semantic information is
preserved. Second a technique is used to detect the domain specific ânon-informative
visual wordsâ which are ineffective at representing the content of visual data and
degrade its categorisation ability. Third, a method to combine an ontology model with
xi
a visual word model to resolve synonym (visual heterogeneity) and polysemy
problems, is proposed. The experimental results show that this approach can discover
semantically meaningful visual content descriptions and recognise specific events,
e.g., sports events, depicted in images efficiently.
Since discovering the semantics of an image is an extremely challenging problem, one
promising approach to enhance visual content interpretation is to use any associated
textual information that accompanies an image, as a cue to predict the meaning of an
image, by transforming this textual information into a structured annotation for an
image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct
types of information representation and modality, there are some strong, invariant,
implicit, connections between images and any accompanying text information.
Semantic analysis of image captions can be used by image retrieval systems to
retrieve selected images more precisely. To do this, a Natural Language Processing
(NLP) is exploited firstly in order to extract concepts from image captions. Next, an
ontology-based knowledge model is deployed in order to resolve natural language
ambiguities. To deal with the accompanying text information, two methods to extract
knowledge from textual information have been proposed. First, metadata can be
extracted automatically from text captions and restructured with respect to a semantic
model. Second, the use of LSI in relation to a domain-specific ontology-based
knowledge model enables the combined framework to tolerate ambiguities and
variations (incompleteness) of metadata. The use of the ontology-based knowledge
model allows the system to find indirectly relevant concepts in image captions and
thus leverage these to represent the semantics of images at a higher level.
Experimental results show that the proposed framework significantly enhances image
retrieval and leads to narrowing of the semantic gap between lower level machinederived
and higher level human-understandable conceptualisation
Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos
We propose a new zero-shot Event Detection method by Multi-modal
Distributional Semantic embedding of videos. Our model embeds object and action
concepts as well as other available modalities from videos into a
distributional semantic space. To our knowledge, this is the first Zero-Shot
event detection model that is built on top of distributional semantics and
extends it in the following directions: (a) semantic embedding of multimodal
information in videos (with focus on the visual modalities), (b) automatically
determining relevance of concepts/attributes to a free text query, which could
be useful for other applications, and (c) retrieving videos by free text event
query (e.g., "changing a vehicle tire") based on their content. We embed videos
into a distributional semantic space and then measure the similarity between
videos and the event query in a free text form. We validated our method on the
large TRECVID MED (Multimedia Event Detection) challenge. Using only the event
title as a query, our method outperformed the state-of-the-art that uses big
descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC
metric. It is also an order of magnitude faster.Comment: To appear in AAAI 201
- âŠ