140,565 research outputs found

    An Information Retrieval Model Based On Word Concept

    Get PDF
    PACLIC 20 / Wuhan, China / 1-3 November, 200

    Enhancing Information Retrieval Through Concept-Based Language Modeling and Semantic Smoothing.

    Get PDF
    Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well‐known problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal‐concepts, or word relationships, but such models are estimated using simple n‐grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a concept‐based language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept‐based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word‐based model and the Markov Random Field model (using a Markov classifier)

    Different Modes of Semantic Representation in Image Retrieval

    Get PDF
    Semantic representations of words can be acquired by both textual and perceptual information. Multimodal models integrate this information, and outperform text-only semantic representations of words. In many contexts, they better reflect human concept acquisition. A common model for semantic representation is the semantic vector, a list of decimal numbers representing the clusters in which a word appears in text. Studies have shown that if two words have similar vectors, they are likely to have similar meaning, or at least be relevant to each other. Other approaches entail inserting sentences, made up of caption words from an image set, into text, to modify the vectors corresponding to each word in a textual corpus\u27s vocabulary, and thus form different semantic representations. These techniques have also suggested that whereas concrete terms\u27 meanings tend to improve with propagation, abstract terms tend to become less accurate when too much information from their more concrete counterparts is propagated to them. In this study, I have therefore utilized different techniques for comparing words\u27 meanings, to implement an image retrieval system. Even if a word w does not directly tag an image, the system retrieves images whose captions contain words that have the most similar vector representations to that of w. Therefore, we examine the extent to which a word\u27s semantic representation has improved, based on improvements in corresponding retrieval results from this system.https://digitalworks.union.edu/steinmetz_posters/1004/thumbnail.jp

    A Review of Semantic Search Methods to Retrieve Information from the Qur’an Corpus

    Get PDF
    The Holy Qur’an is the most important resource for the Islamic sciences and the Arabic language (Iqbal et al., 2013). Muslims believe that the Qur’an is a revelation from Allah that was given 1,356 years ago. The Qur’an contains about 80,000 words divided into 114 chapters (Atwell et al., 2011). A chapter consists of a varying number of verses. This holy book contains information on diverse topics, such as life and the history of humanity and scientific knowledge (Alrehaili and Atwell, 2014). Corpus linguistics methods can be applied to study the lexical patterns in the Qur’an; for example, the Qur’an is one of the corpora available on the SketchEngine website. Qur’an researchers may want to go beyond word patterns to search for specific concepts and information. As a result, many Qur’anic search applications have been built to facilitate the retrieval of information from the Qur’an. Examples of these web applications are Qurany (Abbas, 2009), Qur’an Explorer (Explorer, 2005), Tanzil (Zarrabi-Zadeh, 2007), Qur’anic Arabic corpus (Dukes, 2013), and Quran.com. The techniques used to retrieve information from the Qur’an can be classified into two types: semantic-based and keyword-based. Semantic-based search techniques are concept-based which retrieves results by matching the contextual meaning of terms as they appear in a user’s query, whereas the keyword-based search technique returns results according to the letters in the word(s) of a query (Sudeepthi et al., 2012). The majority of Qur’anic search tools employ the keyword search technique. The existing Qur’anic semantic search techniques include the ontology-based technique (concepts) (Yauri et al., 2013), the synonyms-set technique (Shoaib et al., 2009), and the cross language information retrieval (CLIR) technique (Yunus et al., 2010). The ontology-based technique searches for the concept(s) matching a user’s query and then returns the verses related to these concept(s). The synonyms-set method produces all synonyms of the query word using WordNet and then returns all Qur’anic verses that contain words matching any synonyms of the query word. Cross language information retrieval (CLIR) translates the words of an input query into another language and then retrieves verses that contain words matching the translated words. On the other hand, keyword-based techniques include keyword matching, the morphologically-based technique (Al Gharaibeh et al., 2011), and use of a Chabot (Abu Shawar and Atwell, 2004). The keyword matching method returns verses that contain any of the query words. The morphologically-based technique uses stems of query words to search in the Qur’an corpus. In other words, this technique generates all other forms of the query words and then finds all Qur’anic verses matching those word forms. The Chabot selects the most important words such as nouns or verbs from a user query and then returns the Qur’anic verses that contain any words matching the selected words. There are several deficiencies with the Qur’anic verses (Aya’at) retrieved for a query using the existing keyword search technique. These problems include the following: some irrelevant verses are retrieved, some relevant verses are not retrieved, or the sequence of retrieved verses is not in the right order (Shoaib et al., 2009). Misunderstanding the exact meaning of input words forming a query and neglecting some theories of information retrieval contribute significantly to limitations in the keyword-based technique (Raza et al.). Additionally, Qur’anic keyword search tools use limited Islamic resources related to the Qur’an. This affects the accuracy of the retrieved results. Moreover, current Qur’anic semantic search techniques have limitations in retrieved results. The main causes of these limitations include the following: semantic search tools use one source of Qur’anic ontology that does not cover all concepts in the Holy Qur’an, and Qur’anic ontologies are not aligned to each other, leading to inaccurate and uncomprehensive resources for Qur’anic ontology. To overcome the limitations in both semantic and keyword search techniques, we designed a framework for a new semantic search tool called the Qur’anic Semantic Search Tool (QSST). This search tool aims to employ both text-based and semantic search techniques. QSST aligns the existing Quranic ontologies to reduce the ambiguity in the search results. QSST can be divided into four components: a natural language analyser (NLA), a semantic search model (SSM), a keywords search model (KSM), and a scoring and ranking model (SRM). NLA tokenizes a user’s query and then applies different natural language processing techniques to the tokenized query. These techniques are the following: spelling correction, stop word removal, stemming, and part of speech tagging (POS). After that, the NLA uses WordNet to generate synonyms for the reformatted query words and sends these synonyms to the SSM and the KSM. The SSM searches in the Qur’anic Ontology database to find the related concepts of the normalised query and then returns results. At the same time, KSM retrieves results based on words matching the input words. SRM refines the results retrieved from both KSM and SSM by eliminating the redundant verses. Next, SRM ranks and scores the refined results. Finally, SRM presents the results to the user. References Abbas, N. H. 2009. Quran 'search for a concept' tool and website. MRes thesis, University of Leeds. Abu Shawar, B. and Atwell, E. 2004. An Arabic chatbot giving answers from the Qur'an. Proceedings of TALN. 4(2), pp.197-202. Al Gharaibeh, A. et al. 2011. The usage of formal methods in Quran search system. In: Proceedings of international conference on information and communication systems, Ibrid, Jordan. pp.22-24. Alrehaili, S. M. and Atwell, E. 2014. Computational ontologies for semantic tagging of the Quran: A survey of past approaches. In: LREC 2014 Proceedings. Atwell, E. et al. 2011. An artificial intelligence approach to Arabic and Islamic content on the internet. In: Proceedings of NITS 3rd National Information Technology Symposium. Dukes, K. 2013. Statistical parsing by machine learning from a classical Arabic treebank. PhD thesis. Explorer, Q. 2005. Quran Explorer [Online]. [Accessed 26 October 2014]. Available from: http://www.quranexplorer.com/Search/Default.aspx Iqbal, R. et al. 2013. An experience of developing Quran ontology with contextual information support. Multicultural Education & Technology Journal. 7, pp.333-343. Raza, S.A. et al. An essential framework for concept based evolutionary Quranic search engine (CEQSE). Shoaib, M. et al. 2009. Relational WordNet model for semantic search in Holy Quran. Emerging Technologies, 2009. ICET 2009. International Conference on, 2009. IEEE, 29-34. Sudeepthi, G. et al. 2012. A survey on semantic web search engine. International Journal of Computer Science, 9. Yauri, A. R. et al. 2013. Quranic verse extraction based on concepts using OWL-DL ontology. Research Journal of Applied Sciences Engineering and Technology. 6, pp.4492-4498. Yunus, M. et al. 2010. Semantic query for Quran documents results. Open Systems (ICOS), 2010 IEEE Conference on, 2010. IEEE, 1-5. Zarrabi-Zadeh, H. 2007. Tanzil [Online]. [Accessed 26 October 2014]. Available from: http://tanzil.net

    Semantic Relevance Analysis of Subject-Predicate-Object (SPO) Triples

    Get PDF
    The goal of this thesis is to explore and integrate several existing measurements for ranking the relevance of a set of subject-predicate-object (SPO) triples to a given concept. As we are inundated with information from multiple sources on the World-Wide-Web, SPO similarity measures play a progressively important role in information extraction, information retrieval, document clustering and ontology learning. This thesis is applied in the Cyber Security Domain for identifying and understanding the factors and elements of sociopolitical events relevant to cyberattacks. Our efforts are towards developing an algorithm that begins with an analysis of news articles by taking into account the semantic information and word order information in the SPOs extracted from the articles. The semantic cohesiveness of a user provided concept and the extracted SPOs will then be calculated using semantic similarity measures derived from 1) structured lexical databases; and 2) our own corpus statistics. The use of a lexical database will enable our method to model human common sense knowledge, while the incorporation of our own corpus statistics allows our method to be adaptable to the Cyber Security domain. The model can be extended to other domains by simply changing the local corpus. The integration of different measures will help us triangulate the ranking of SPOs from multiple dimensions of semantic cohesiveness. Our results are compared to rankings gathered from surveys of human users, where each respondent ranks a list of SPO based on their common knowledge and understanding of the relevance evaluations to a given concept. The comparison demonstrates that our integrated SPO similarity ranking scheme closely reflects the human common sense knowledge in a specific domain it addresses

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

    Full text link
    We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.Comment: To appear in AAAI 201
    • 

    corecore