729 research outputs found

    Dealing with uncertain entities in ontology alignment using rough sets

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision

    Hybrid Query Expansion on Ontology Graph in Biomedical Information Retrieval

    Get PDF
    Nowadays, biomedical researchers publish thousands of papers and journals every day. Searching through biomedical literature to keep up with the state of the art is a task of increasing difficulty for many individual researchers. The continuously increasing amount of biomedical text data has resulted in high demands for an efficient and effective biomedical information retrieval (BIR) system. Though many existing information retrieval techniques can be directly applied in BIR, BIR distinguishes itself in the extensive use of biomedical terms and abbreviations which present high ambiguity. First of all, we studied a fundamental yet simpler problem of word semantic similarity. We proposed a novel semantic word similarity algorithm and related tools called Weighted Edge Similarity Tools (WEST). WEST was motivated by our discovery that humans are more sensitive to the semantic difference due to the categorization than that due to the generalization/specification. Unlike most existing methods which model the semantic similarity of words based on either the depth of their Lowest Common Ancestor (LCA) or the traversal distance of between the word pair in WordNet, WEST also considers the joint contribution of the weighted distance between two words and the weighted depth of their LCA in WordNet. Experiments show that weighted edge based word similarity method has achieved 83.5% accuracy to human judgments. Query expansion problem can be viewed as selecting top k words which have the maximum accumulated similarity to a given word set. It has been proved as an effective method in BIR and has been studied for over two decades. However, most of the previous researches focus on only one controlled vocabulary: MeSH. In addition, early studies find that applying ontology won\u27t necessarily improve searching performance. In this dissertation, we propose a novel graph based query expansion approach which is able to take advantage of the global information from multiple controlled vocabularies via building a biomedical ontology graph from selected vocabularies in Metathesaurus. We apply Personalized PageRank algorithm on the ontology graph to rank and identify top terms which are highly relevant to the original user query, yet not presented in that query. Those new terms are reordered by a weighted scheme to prioritize specialized concepts. We multiply a scaling factor to those final selected terms to prevent query drifting and append them to the original query in the search. Experiments show that our approach achieves 17.7% improvement in 11 points average precision and recall value against Lucene\u27s default indexing and searching strategy and by 24.8% better against all the other strategies on average. Furthermore, we observe that expanding with specialized concepts rather than generalized concepts can substantially improve the recall-precision performance. Furthermore, we have successfully applied WEST from the underlying WordNet graph to biomedical ontology graph constructed by multiple controlled vocabularies in Metathesaurus. Experiments indicate that WEST further improve the recall-precision performance. Finally, we have developed a Graph-based Biomedical Search Engine (G-Bean) for retrieving and visualizing information from literature using our proposed query expansion algorithm. G-Bean accepts any medical related user query and processes them with expanded medical query to search for the MEDLINE database

    Principles of content analysis for information retrieval systems: an overview

    Full text link
    "Unquestionably, the content analysis which has emerged as part of Information Retrieval Systems (IRS, e.g. literature databases) over the past 20 years has much in common with the content analysis used by linguists or in the social sciences. However, its intrinsic value stems from the special context in which it is used: a) Close interdependencies link the selected content analysis with the retrieval situation. The user’s retrieval strategies, which are intended to obtain information relevant to the current problem situation, and the available aids (e.g. expansion lists or user-friendly browsing tools) affect the efficacy of some analysis techniques (e.g. noun phrase analysis from computer linguistics) to a considerable extent. b) Normally, a commercial IRS handles mass data, thus necessitating the use of a reduced content analysis even today. Full morphological, syntactic, semantic and pragmatic text analyses are unthinkable simply for efficiency reasons but also for knowledge reasons. Content analysis in IRS is therefore a component part of a special type of restricted system which obeys its own laws. Against the backdrop of these considerations, forms of content analysis in present-day commercial retrieval systems are studied and promising expansions and alternatives are proposed." (author's abstract
    • …
    corecore