108 research outputs found

    A counter example to a conjecture of D. J. Rose on minimum triangulation

    Get PDF

    Source authenticity in the UMLS – A case study of the Minimal Standard Terminology

    Get PDF
    AbstractAs the UMLS integrates multiple source vocabularies, the integration process requires that certain adaptation be applied to the source. Our interest is in examining the relationship between the UMLS representation of a source vocabulary and the source vocabulary itself. We investigated the integration of the Minimal Standard Terminology (MST) into the UMLS in order to examine how close its UMLS representation is to the source MST. The MST was conceived as a “minimal” list of terms and structure intended for use within computer systems to facilitate standardized reporting of gastrointestinal endoscopic examinations. Although the MST has an overall schema and implied relationship structure, many of the UMLS integrated MST terms were found to be hierarchically orphaned, and with lateral relationships that do not closely adhere to the source MST. Thus, the MST representation within the UMLS significantly differs from that of the source MST. These representation discrepancies may affect the usability of the MST representation in the UMLS for knowledge acquisition. Furthermore, they pose a problem from the perspective of application developers. While these findings may not necessarily apply to other source terminologies, they highlight the conflict between preservation of authentic concept orientation and the UMLS overall desire to provide fully specified names for all source terms

    Detecting Role Errors in the Gene Hierarchy of the NCI Thesaurus

    Get PDF
    Gene terminologies are playing an increasingly important role in the ever-growing field of genomic research. While errors in large, complex terminologies are inevitable, gene terminologies are even more susceptible to them due to the rapid growth of genomic knowledge and the nature of its discovery. It is therefore very important to establish quality-assurance protocols for such genomic-knowledge repositories. Different kinds of terminologies oftentimes require auditing methodologies adapted to their particular structures. In light of this, an auditing methodology tailored to the characteristics of the NCI Thesaurus’s (NCIT’s) Gene hierarchy is presented. The Gene hierarchy is of particular interest to the NCIT’s designers due to the primary role of genomics in current cancer research. This multiphase methodology focuses on detecting role-errors, such as missing roles or roles with incorrect or incomplete target structures, occurring within that hierarchy. The methodology is based on two kinds of abstraction networks, called taxonomies, that highlight the role distribution among concepts within the IS-A (subsumption) hierarchy. These abstract views tend to highlight portions of the hierarchy having a higher concentration of errors. The errors found during an application of the methodology are reported. Hypotheses pertaining to the efficacy of our methodology are investigated

    The cohesive metaschema: a higher-level abstraction of the UMLS Semantic Network

    Get PDF
    AbstractThe Unified Medical Language System (UMLS) joins together a group of established medical terminologies in a unified knowledge representation framework. Two major resources of the UMLS are its Metathesaurus, containing a large number of concepts, and the Semantic Network (SN), containing semantic types and forming an abstraction of the Metathesaurus. However, the SN itself is large and complex and may still be difficult to view and comprehend. Our structural partitioning technique partitions the SN into structurally uniform sets of semantic types based on the distribution of the relationships within the SN. An enhancement of the structural partition results in cohesive, singly rooted sets of semantic types. Each such set is named after its root which represents the common nature of the group. These sets of semantic types are represented by higher-level components called meta-semantic types. A network, called a metaschema, which consists of the meta-semantic types connected by hierarchical and semantic relationships is obtained and provides an abstract view supporting orientation to the SN. The metaschema is utilized to audit the UMLS classifications. We present a set of graphical views of the SN based on the metaschema to help in user orientation to the SN. A study compares the cohesive metaschema to metaschemas derived semantically by UMLS experts

    A chemical specialty semantic network for the Unified Medical Language System

    Get PDF
    Background Terms representing chemical concepts found the Unified Medical Language System (UMLS) are used to derive an expanded semantic network with mutually exclusive semantic types. The UMLS Semantic Network (SN) is composed of a collection of broad categories called semantic types (STs) that are assigned to concepts. Within the UMLS’s coverage of the chemical domain, we find a great deal of concepts being assigned more than one ST. This leads to the situation where the extent of a given ST may contain concepts elaborating variegated semantics. A methodology for expanding the chemical subhierarchy of the SN into a finer-grained categorization of mutually exclusive types with semantically uniform extents is presented. We call this network a Chemical Specialty Semantic Network (CSSN). A CSSN is derived automatically from the existing chemical STs and their assignments. The methodology incorporates a threshold value governing the minimum size of a type’s extent needed for inclusion in the CSSN. Thus, different CSSNs can be created by choosing different threshold values based on varying requirements. Results A complete CSSN is derived using a threshold value of 300 and having 68 STs. It is used effectively to provide high-level categorizations for a random sample of compounds from the “Chemical Entities of Biological Interest” (ChEBI) ontology. The effect on the size of the CSSN using various threshold parameter values between one and 500 is shown. Conclusions The methodology has several potential applications, including its use to derive a pre-coordinated guide for ST assignments to new UMLS chemical concepts, as a tool for auditing existing concepts, inter-terminology mapping, and to serve as an upper-level network for ChEBI

    Outlier concepts auditing methodology for a large family of biomedical ontologies

    Get PDF
    Background: Summarization networks are compact summaries of ontologies. The “Big Picture” view offered by summarization networks enables to identify sets of concepts that are more likely to have errors than control concepts. For ontologies that have outgoing lateral relationships, we have developed the partial-area taxonomy summarization network. Prior research has identified one kind of outlier concepts, concepts of small partials-areas within partial-area taxonomies. Previously we have shown that the small partial-area technique works successfully for four ontologies (or their hierarchies). Methods: To improve the Quality Assurance (QA) scalability, a family-based QA framework, where one QA technique is potentially applicable to a whole family of ontologies with similar structural features, was developed. The 373 ontologies hosted at the NCBO BioPortal in 2015 were classified into a collection of families based on structural features. A meta-ontology represents this family collection, including one family of ontologies having outgoing lateral relationships. The process of updating the current meta-ontology is described. To conclude that one QA technique is applicable for at least half of the members for a family F, this technique should be demonstrated as successful for six out of six ontologies in F. We describe a hypothesis setting the condition required for a technique to be successful for a given ontology. The process of a study to demonstrate such success is described. This paper intends to prove the scalability of the small partial-area technique. Results: We first updated the meta-ontology classifying 566 BioPortal ontologies. There were 371 ontologies in the family with outgoing lateral relationships. We demonstrated the success of the small partial-area technique for two ontology hierarchies which belong to this family, SNOMED CT’s Specimen hierarchy and NCIt’s Gene hierarchy. Together with the four previous ontologies from the same family, we fulfilled the “six out of six” condition required to show the scalability for the whole family. Conclusions: We have shown that the small partial-area technique can be potentially successful for the family of ontologies with outgoing lateral relationships in BioPortal, thus improve the scalability of this QA technique

    Missing lateral relationships in top‑level concepts of an ontology

    Full text link
    Background: Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored. Methods: The area taxonomy and the subtaxonomy are deployed to help reveal concepts that have a high likelihood of exhibiting missing relationship errors. A specific top-level grouping unit found within the area taxonomy and subtaxonomy, when deemed to be anomalous, is used as an indicator that missing relationship errors are likely to be found among certain concepts. Two hypotheses pertaining to the effectiveness of our Quality Assurance approach are studied. Results: Our Quality Assurance methodology was applied to the Biological Process hierarchy of the National Cancer Institute thesaurus (NCIt) and SNOMED CT’s Eye/vision finding subhierarchy within its Clinical finding hierarchy. Many missing relationship errors were discovered and confirmed in our analysis. For both test-bed hierarchies, our Quality Assurance methodology yielded a statistically significantly higher number of concepts with missing relationship errors in comparison to a control sample of concepts. Two hypotheses are confirmed by these findings. Conclusions: Quality assurance is a critical part of an ontology’s lifecycle, and automated or semi-automated tools for supporting this process are invaluable. We introduced a Quality Assurance methodology targeted at missing relationship errors. Its successful application to the NCIt’s Biological Process hierarchy and SNOMED CT’s Eye/vision finding subhierarchy indicates that it can be a useful addition to the arsenal of tools available to ontology maintenance personnel

    A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology

    Get PDF
    The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology in early 2020. As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment. CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications

    The cohesivemetiveZ.;E a higher-levelabstr-leve of tZ UMLSSemant Netnt

    No full text
    The Unified Medical LanguageSysta (UMLS) joinstinsZxz a group ofestz.AZy#; medicalticalZq#zqWZ in a unified knowledgerepresent#;AZ framework. Two major resources of tZ UMLS are it MetAjx#Zy#W#q contjx#Zy a large number of conceptz and tdSemantZ Netnt (SN),contxjWjx semantj tman and forming anabstqzZy#z of tZ MetWWEZy#zxqz However, to SNitE;q is large and complex and maystZx bedi#cult t view and comprehend. Our st#z;qWZy part;qWZy#E tart;qW part;qWZy tp SNint st;x;Zy#AjA uniformset ofsemantE tman based on tZ distqzZy#Aj of tZ relatWZy#AjA witat tt SN. Anenhancement of tZ stj;.Zy## part.Zy# result in cohesive, singly rooty set ofsemantz tmant Each suchset is named aftdit root which represent te common natnZ of tZ group. Theseset ofsemantz tman arerepresent# by higher-levelcomponent calledmete semantZ tmant AnetjA#Z called ametAWZy#.z whichconsist of tZ metz;jZy#.zz tet connectZ by hierarchical andsemant. relat..Zy#. isobtjEjZ and provides anabstx#A viewsupport.Z orient.Zy t tr SN. Themetjz.;Zy isutqEAWj t audit td UMLS classificat;Azz Wepresent aset of graphical views of tZ SN based on tZ metWA;Zyx t help in userorientE;Zy t tr SN. Ast;W comparestp cohesivemetiveZ.E t metiveZ.E# derivedsemant#jWxA by UMLSexpert
    corecore