719 research outputs found

    CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

    Get PDF
    Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE

    Proposal of an ontology based web search engine

    Get PDF
    When users search for information in a web site, sometimes they do not get what they want. Assuming that the scope where the search take place works fine, there are some problems caused by the way the user interact with the system, others that refer to characteristics of the language used, and others caused by the lack or nonexistent semantics in web documents. In this work, we propose a web search engine of a particular web site that uses ontologies and information retrieval techniques. Although the architecture we propose is applicable to any domain, the experimentation was done in a tourism web site. The results show a substantial improvement in the effectiveness of the search, with a gain of 33% in Precision.Workshop de Ingeniería de Software y Bases de Datos (WISBD)Red de Universidades con Carreras en Informática (RedUNCI

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Document Indexing Strategies in Big Data A Survey

    Get PDF
    From past few years, the operations of the Internet have a significant growth and individuals, organizations were unaware of this data explosion. Because of the increasing quantity and diversity of digital documents available to end users, mechanism for their effective and efficient retrieval is given highest importance. One crucial aspect of this mechanism is indexing, which serves to allow documents to be located quickly. The problem is that users want to retrieve on the basis of context, and individual words provide unreliable evidence about the contextual topic or meaning of a document. Hence, the available solutions cannot meet the needs of the growing heterogeneous data in terms of processing. This results in inefficient information retrieval or search query results. The design of indexing strategies that can support this need is required. There are various indexing strategies which are utilized for solving Big Data management issues, and can also serve as a base for the design of more efficient indexing strategies. The aim is to explore document indexing strategy for Big Data manageability. The existing systems like, Latent Semantic Indexing , Inverted Indexing, Semantic indexing and Vector Space Model has their own challenges such as, Demands high computational performance, Consumes more memory Space, Longer data processing time, Limits the search space, will not produce the exact answer, Can present wrong answers due to synonyms and polysemy, approach makes use of formal ontology. This paper will describe and compare the various Indexing techniques and presents the characteristics and challenges involved

    Proposal of an ontology based web search engine

    Get PDF
    When users search for information in a web site, sometimes they do not get what they want. Assuming that the scope where the search take place works fine, there are some problems caused by the way the user interact with the system, others that refer to characteristics of the language used, and others caused by the lack or nonexistent semantics in web documents. In this work, we propose a web search engine of a particular web site that uses ontologies and information retrieval techniques. Although the architecture we propose is applicable to any domain, the experimentation was done in a tourism web site. The results show a substantial improvement in the effectiveness of the search, with a gain of 33% in Precision.Workshop de Ingeniería de Software y Bases de Datos (WISBD)Red de Universidades con Carreras en Informática (RedUNCI

    Russian word sense induction by clustering averaged word embeddings

    Full text link
    The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018). Our team was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th for the bts-rnc and active-dict datasets (containing mostly polysemous words) among all 19 participants. The method we employed was extremely naive. It implied representing contexts of ambiguous words as averaged word embedding vectors, using off-the-shelf pre-trained distributional models. Then, these vector representations were clustered with mainstream clustering techniques, thus producing the groups corresponding to the ambiguous word senses. As a side result, we show that word embedding models trained on small but balanced corpora can be superior to those trained on large but noisy data - not only in intrinsic evaluation, but also in downstream tasks like word sense induction.Comment: Proceedings of the 24rd International Conference on Computational Linguistics and Intellectual Technologies (Dialogue-2018

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Improving Search Ranking Using a Composite Scoring Approach

    Get PDF
    In this thesis, the improvement to relevance in computerized search results is studied. Information search tools return ranked lists of documents ordered by the relevance of the documents to the user supplied search. Using a small number of words and phrases to represent complex ideas and concepts causes user search queries to be information sparse. This sparsity challenges search tools to locate relevant documents for users. A review of the challenges to information searches helps to identify the problems and offer suggestions in improving current information search tools. Using the suggestions put forth by the Strategic Workshop on Information Retrieval in Lorne (SWIRL), a composite scoring approach (Composite Scorer) is developed. The Composite Scorer considers various aspects of information needs to improve the ranked results of search by returning records relevant to the user’s information need. The Florida Fusion Center (FFC), a local law enforcement agency has a need for a more effective information search tool. Daily, the agency processes large amounts of police reports typically written as text documents. Current information search methods require inordinate amounts of time and skill to identify relevant police reports from their large collection of police reports. An experiment conducted by FFC investigators contrasted the composite scoring approach against a common search scoring approach (TF/IDF). In the experiment, police investigators used a custom-built software interface to conduct several use case scenarios for searching for related documents to various criminal investigations. Those expert users then evaluated the results of the top ten ranked documents returned from both search scorers to measure the relevance to the user of the returned documents. The evaluations were collected and measurements used to evaluate the performance of the two scorers. A search with many irrelevant documents has a cost to the users in both time and potentially in unsolved crimes. A cost function contrasted the difference in cost between the two scoring methods for the use cases. Mean Average Precision (MAP) is a common method used to evaluate the performance of ranked list search results. MAP was computed for both scoring methods to provide a numeric value representing the accuracy of each scorer at returning relevant documents in the top-ten documents of a ranked list of search results. The purpose of this study is to determine if a composite scoring approach to ranked lists, that considers multiple aspects of a user’s search, can improve the quality of search, returning greater numbers of relevant documents during an information search. This research contributes to the understanding of composite scoring methods to improve search results. Understanding the value of composite scoring methods allows researchers to evaluate, explore and possibly extend the approach, incorporating other information aspects such as word and document meaning
    • …
    corecore