4,890 research outputs found

    Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™

    Get PDF
    Motivation: With the tremendous growth in scientific literature, it is necessary to improve upon the standard pattern matching style of the available search engines. Semantic NLP may be the solution to this problem. Cognition Search (CSIR) is a natural language technology. It is best used by asking a simple question that might be answered in textual data being queried, such as MEDLINE. CSIR has a large English dictionary and semantic database. Cognition’s semantic map enables the search process to be based on meaning rather than statistical word pattern matching and, therefore, returns more complete and relevant results. The Cognition Search engine uses downward reasoning and synonymy which also improves recall. It improves precision through phrase parsing and word sense disambiguation.
Result: Here we have carried out several projects to "teach" the CSIR lexicon medical, biochemical and molecular biological language and acronyms from curated web-based free sources. Vocabulary from the Alliance for Cell Signaling (AfCS), the Human Genome Nomenclature Consortium (HGNC), the United Medical Language System (UMLS) Meta-thesaurus, and The International Union of Pure and Applied Chemistry (IUPAC) was introduced into the CSIR dictionary and curated. The resulting system was used to interpret MEDLINE abstracts. Meaning-based search of MEDLINE abstracts yields high precision (estimated at >90%), and high recall (estimated at >90%), where synonym information has been encoded. The present implementation can be found at http://MEDLINE.cognition.com. 
&#xa

    Dealing with uncertain entities in ontology alignment using rough sets

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision

    Logic-based assessment of the compatibility of UMLS ontology sources

    Get PDF
    Background: The UMLS Metathesaurus (UMLS-Meta) is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies. UMLS-Meta is being used in many applications, including PubMed and ClinicalTrials.gov. The integration of new sources combines automatic techniques, expert assessment, and auditing protocols. The automatic techniques currently in use, however, are mostly based on lexical algorithms and often disregard the semantics of the sources being integrated. Results: In this paper, we argue that UMLS-Meta’s current design and auditing methodologies could be significantly enhanced by taking into account the logic-based semantics of the ontology sources. We provide empirical evidence suggesting that UMLS-Meta in its 2009AA version contains a significant number of errors; these errors become immediately apparent if the rich semantics of the ontology sources is taken into account, manifesting themselves as unintended logical consequences that follow from the ontology sources together with the information in UMLS-Meta. We then propose general principles and specific logic-based techniques to effectively detect and repair such errors. Conclusions: Our results suggest that the methodologies employed in the design of UMLS-Meta are not only very costly in terms of human effort, but also error-prone. The techniques presented here can be useful for both reducing human effort in the design and maintenance of UMLS-Meta and improving the quality of its contents

    Local matching learning of large scale biomedical ontologies

    Get PDF
    Les larges ontologies biomĂ©dicales dĂ©crivent gĂ©nĂ©ralement le mĂȘme domaine d'intĂ©rĂȘt, mais en utilisant des modĂšles de modĂ©lisation et des vocabulaires diffĂ©rents. Aligner ces ontologies qui sont complexes et hĂ©tĂ©rogĂšnes est une tĂąche fastidieuse. Les systĂšmes de matching doivent fournir des rĂ©sultats de haute qualitĂ© en tenant compte de la grande taille de ces ressources. Les systĂšmes de matching d'ontologies doivent rĂ©soudre deux problĂšmes: (i) intĂ©grer la grande taille d'ontologies, (ii) automatiser le processus d'alignement. Le matching d'ontologies est une tĂąche difficile en raison de la large taille des ontologies. Les systĂšmes de matching d'ontologies combinent diffĂ©rents types de matcher pour rĂ©soudre ces problĂšmes. Les principaux problĂšmes de l'alignement de larges ontologies biomĂ©dicales sont: l'hĂ©tĂ©rogĂ©nĂ©itĂ© conceptuelle, l'espace de recherche Ă©levĂ© et la qualitĂ© rĂ©duite des alignements rĂ©sultants. Les systĂšmes d'alignement d'ontologies combinent diffĂ©rents matchers afin de rĂ©duire l'hĂ©tĂ©rogĂ©nĂ©itĂ©. Cette combinaison devrait dĂ©finir le choix des matchers Ă  combiner et le poids. DiffĂ©rents matchers traitent diffĂ©rents types d'hĂ©tĂ©rogĂ©nĂ©itĂ©. Par consĂ©quent, le paramĂ©trage d'un matcher devrait ĂȘtre automatisĂ© par les systĂšmes d'alignement d'ontologies afin d'obtenir une bonne qualitĂ© de correspondance. Nous avons proposĂ© une approche appele "local matching learning" pour faire face Ă  la fois Ă  la grande taille des ontologies et au problĂšme de l'automatisation. Nous divisons un gros problĂšme d'alignement en un ensemble de problĂšmes d'alignement locaux plus petits. Chaque problĂšme d'alignement local est indĂ©pendamment alignĂ© par une approche d'apprentissage automatique. Nous rĂ©duisons l'Ă©norme espace de recherche en un ensemble de taches de recherche de corresondances locales plus petites. Nous pouvons aligner efficacement chaque tache de recherche de corresondances locale pour obtenir une meilleure qualitĂ© de correspondance. Notre approche de partitionnement se base sur une nouvelle stratĂ©gie Ă  dĂ©coupes multiples gĂ©nĂ©rant des partitions non volumineuses et non isolĂ©es. Par consĂ©quence, nous pouvons surmonter le problĂšme de l'hĂ©tĂ©rogĂ©nĂ©itĂ© conceptuelle. Le nouvel algorithme de partitionnement est basĂ© sur le clustering hiĂ©rarchique par agglomĂ©ration (CHA). Cette approche gĂ©nĂšre un ensemble de tĂąches de correspondance locale avec un taux de couverture suffisant avec aucune partition isolĂ©e. Chaque tĂąche d'alignement local est automatiquement alignĂ©e en se basant sur les techniques d'apprentissage automatique. Un classificateur local aligne une seule tĂąche d'alignement local. Les classificateurs locaux sont basĂ©s sur des features Ă©lĂ©mentaires et structurelles. L'attribut class de chaque set de donne d'apprentissage " training set" est automatiquement Ă©tiquetĂ© Ă  l'aide d'une base de connaissances externe. Nous avons appliquĂ© une technique de sĂ©lection de features pour chaque classificateur local afin de sĂ©lectionner les matchers appropriĂ©s pour chaque tĂąche d'alignement local. Cette approche rĂ©duit la complexitĂ© d'alignement et augmente la prĂ©cision globale par rapport aux mĂ©thodes d'apprentissage traditionnelles. Nous avons prouvĂ© que l'approche de partitionnement est meilleure que les approches actuelles en terme de prĂ©cision, de taux de couverture et d'absence de partitions isolĂ©es. Nous avons Ă©valuĂ© l'approche d'apprentissage d'alignement local Ă  l'aide de diverses expĂ©riences basĂ©es sur des jeux de donnĂ©es d'OAEI 2018. Nous avons dĂ©duit qu'il est avantageux de diviser une grande tĂąche d'alignement d'ontologies en un ensemble de tĂąches d'alignement locaux. L'espace de recherche est rĂ©duit, ce qui rĂ©duit le nombre de faux nĂ©gatifs et de faux positifs. L'application de techniques de sĂ©lection de caractĂ©ristiques Ă  chaque classificateur local augmente la valeur de rappel pour chaque tĂąche d'alignement local.Although a considerable body of research work has addressed the problem of ontology matching, few studies have tackled the large ontologies used in the biomedical domain. We introduce a fully automated local matching learning approach that breaks down a large ontology matching task into a set of independent local sub-matching tasks. This approach integrates a novel partitioning algorithm as well as a set of matching learning techniques. The partitioning method is based on hierarchical clustering and does not generate isolated partitions. The matching learning approach employs different techniques: (i) local matching tasks are independently and automatically aligned using their local classifiers, which are based on local training sets built from element level and structure level features, (ii) resampling techniques are used to balance each local training set, and (iii) feature selection techniques are used to automatically select the appropriate tuning parameters for each local matching context. Our local matching learning approach generates a set of combined alignments from each local matching task, and experiments show that a multiple local classifier approach outperforms conventional, state-of-the-art approaches: these use a single classifier for the whole ontology matching task. In addition, focusing on context-aware local training sets based on local feature selection and resampling techniques significantly enhances the obtained results

    Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

    Get PDF
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves

    Concept graphs: Applications to biomedical text categorization and concept extraction

    Get PDF
    As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification. This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics

    How do Ontology Mappings Change in the Life Sciences?

    Full text link
    Mappings between related ontologies are increasingly used to support data integration and analysis tasks. Changes in the ontologies also require the adaptation of ontology mappings. So far the evolution of ontology mappings has received little attention albeit ontologies change continuously especially in the life sciences. We therefore analyze how mappings between popular life science ontologies evolve for different match algorithms. We also evaluate which semantic ontology changes primarily affect the mappings. We further investigate alternatives to predict or estimate the degree of future mapping changes based on previous ontology and mapping transitions.Comment: Keywords: mapping evolution, ontology matching, ontology evolutio

    Towards natural language question generation for the validation of ontologies and mappings

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)The increasing number of open-access ontologies and their key role in several applications such as decision-support systems highlight the importance of their validation. Human expertise is crucial for the validation of ontologies from a domain point-of-view. However, the growing number of ontologies and their fast evolution over time make manual validation challenging. Methods: We propose a novel semi-automatic approach based on the generation of natural language (NL) questions to support the validation of ontologies and their evolution. The proposed approach includes the automatic generation, factorization and ordering of NL questions from medical ontologies. The final validation and correction is performed by submitting these questions to domain experts and automatically analyzing their feedback. We also propose a second approach for the validation of mappings impacted by ontology changes. The method exploits the context of the changes to propose correction alternatives presented as Multiple Choice Questions. Results: This research provides a question optimization strategy to maximize the validation of ontology entities with a reduced number of questions. We evaluate our approach for the validation of three medical ontologies. We also evaluate the feasibility and efficiency of our mappings validation approach in the context of ontology evolution. These experiments are performed with different versions of SNOMED-CT and ICD9. Conclusions: The obtained experimental results suggest the feasibility and adequacy of our approach to support the validation of interconnected and evolving ontologies. Results also suggest that taking into account RDFS and OWL entailment helps reducing the number of questions and validation time. The application of our approach to validate mapping evolution also shows the difficulty of adapting mapping evolution over time and highlights the importance of semi-automatic validation.The increasing number of open-access ontologies and their key role in several applications such as decision-support systems highlight the importance of their validation. Human expertise is crucial for the validation of ontologies from a domain point-of-vi7115FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)2014/14890-
    • 

    corecore