190 research outputs found

    Towards more Challenging Problems for Ontology Matching Tools

    Get PDF
    We motivate the need for challenging problems in the evaluation of ontology matching tools. To address this need, we propose mapping sets between well-known biomedical ontologies that are based on the UMLS Metathesaurus. These mappings could be used as a basis for a new track in future OAEI campaigns (http://oaei.ontologymatching.org/).
&#xa

    Performance assessment of ontology matching systems for FAIR data

    Get PDF
    © The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings’ classes belonged to top-level classes that matched. Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem

    Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

    Full text link
    Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.Comment: Accepted paper in the 21st International Semantic Web Conference (ISWC-2022); DOI for Bio-ML Dataset: 10.5281/zenodo.651008

    Connecting GOMMA with STROMA: an approach for semantic ontology mapping in the biomedical domain

    Get PDF
    This thesis establishes a connection between GOMMA and STROMA – both are tools of ontology processing. Consequently, a new workflow of denoting a set of correspondences with five semantic relation types has been implemented. Such a rich denotation is scarcely discussed within the literature. The evaluation of the denotation shows that trivial correspondences are easy to recognize (tF > 90). The challenge is the denotation of non-trivial types ( 30 < ntF < 70). A prerequisite of the implemented workflow is the extraction of semantic relations between concepts. These relations represent additional background knowledge for the enrichment tool STROMA and are integrated to the repository SemRep which is accessed by this tool. Thus, STROMA is able to calculate a semantic type more precisely. UMLS was chosen as a biomedical knowledge source because it subsumes many different ontologies of this domain and thus, it represents a rich resource. Nevertheless, only a small set of relations met the requirements which are imposed to SemRep relations. Further studies may analyze whether there is an appropriate way to integrate the missing relations as well. The connection of GOMMA with STROMA allows the semantic enrichment of a biomedical mapping. As a consequence, this thesis enlightens two subjects of research. First, STROMA had been tested with general ontologies, which models common sense knowledge. Within this thesis, STROMA was applied to domain ontologies. Studies have shown that overall, STROMA was able to treat such ontologies as well. However, some strategies for the enrichment process are based on assumption which are misleading in the biomedical domain. Consequently, further strategies are suggested in this thesis which might improve the type denotation. These strategies may lead to an optimization of STROMA for biomedical data sets. A more thorough analysis will review their scope, also beyond the biomedical domain. Second, the established connection may lead to deeper investigations about advantages of semantic enrichment in the biomedical domain as an enriched mapping is returned. Despite heterogeneity of source and target ontology, such a mapping results in an improved interoperability at a finer level of granularity. The utilization of semantically rich correspondences in the biomedical domain is a worthwhile focus for future research

    Results of the second evaluation of matching tools

    Get PDF
    meilicke2012bThis deliverable reports on the results of the second SEALS evaluation campaign (for WP12 it is the third evaluation campaign), which has been carried out in coordination with the OAEI 2011.5 campaign. Opposed to OAEI 2010 and 2011 the full set of OAEI tracks has been executed with the help of SEALS technology. 19 systems have participated and five data sets have been used. Two of these data sets are new and have not been used in previous OAEI campaigns. In this deliverable we report on the data sets used in the campaign, the execution of the campaign, and we present and discuss the evaluation results

    Integrating phenotype ontologies with PhenomeNET

    Get PDF
    Abstract Background Integration and analysis of phenotype data from humans and model organisms is a key challenge in building our understanding of normal biology and pathophysiology. However, the range of phenotypes and anatomical details being captured in clinical and model organism databases presents complex problems when attempting to match classes across species and across phenotypes as diverse as behaviour and neoplasia. We have previously developed PhenomeNET, a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. Results Here, we apply the PhenomeNET to identify related classes from two phenotype and two disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone. Combining automated reasoning with lexical matching further improves results in aligning ontologies. Conclusions PhenomeNET can be used to align and integrate phenotype ontologies. The results can be utilized for biomedical analyses in which phenomena observed in model organisms are used to identify causative genes and mutations underlying human disease

    Ontology-driven and weakly supervised rare disease identification from clinical notes

    Get PDF
    BACKGROUND: Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. METHODS: We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations. RESULTS: The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). CONCLUSION: The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies
    • 

    corecore