7 research outputs found

    Abstraction, extension and structural auditing with the UMLS semantic network

    Get PDF
    The Unified Medical Language System (UMLS) is a two-level biomedical terminological knowledge base, consisting of the Metathesaurus (META) and the Semantic Network (SN), which is an upper-level ontology of broad categories called semantic types (STs). The two levels are related via assignments of one or more STs to each concept of the META. Although the SN provides a high-level abstraction for the META, it is not compact enough. Various metaschemas, which are compact higher-level abstraction networks of the SN, have been derived. A methodology is presented to evaluate and compare two given metaschemas, based on their structural properties. A consolidation algorithm is designed to yield a consolidated metaschema maintaining the best and avoiding the worst of the two given metaschemas. The methodology and consolidation algorithm were applied to the pair of heuristic metaschemas, the top-down metaschema and the bottom-up metaschema, which have been derived from two studies involving two groups of UMLS experts. The results show that the consolidated metaschema has better structural properties than either of the two input metaschemas. Better structural properties are expected to lead to better utilization of a metaschema in orientation and visualization of the SN. Repetitive consolidation, which leads to further structural improvements, is also shown. The META and SN were created in the absence of a comprehensive curated genomics terminology. The internal consistency of the SN\u27s categories which are relevant to genomics is evaluated and changes to improve its ability to express genomic knowledge are proposed. The completeness of the SN with respect to genomic concepts is evaluated and conesponding extensions to the SN to fill identified gaps are proposed. Due to the size and complexity of the UMLS, errors are inevitable. A group auditing methodolgy is presented, where the ST assignments for groups of similar concepts are audited. The extent of an ST, which is the group of all concepts assigned this ST, is divided into groups of concepts that have been assigned exactly the same set of STs. An algorithm finds subgroups of suspicious concepts. The auditor is presented with these subgroups, which purportedly exhibit the same semantics, and thus he will notice different concepts with wrong or missing ST assignments. Another methodology partitions these groups into smaller, singly rooted, hierarchically organized sets used to audit the hierarchical relationships. The algorithmic methodologies are compared with a comprehensive manual audit and show a very high error recall with a much higher precision than the manual exhaustive review

    Enriching and designing metaschemas for the UMLS semantic network

    Get PDF
    The disparate terminologies used by various biomedical applications or professionals make the communication between them more difficult. The Unified Medical Language System (UMLS) of the National Library of Medicine (NLM) is an attempt to integrate different medical terminologies into a unified representation framework to improve decision making and the quality of patient care as well as research in the health-care field. Metathesaurus (META) and Semantic Network (SN) are two main components of the UMLS system, where the SN provides a high-level abstract of the concepts in the META. This dissertation addresses three problems of the SN. First, the SN\u27s two-tree structure is restrictive because it does not allow a semantic type to be a specialization of several other semantic types. This restriction leads to the omission of some subsumption knowledge in the SN. Secondly, the SN is large and complex for comprehension purposes and it does not come with a pictorial representation for users. As a partial solution for this problem, several metaschemas were previously built as higher-level abstractions for the SN to help users\u27 orientation. Third, there is no efficient method to evaluate each metaschema. There is no technique to obtain a consolidated metaschema acceptable for a majority of the UMLS\u27s users. In this dissertation work the author attacked the described problems by using the following approaches. (1) The SN was expanded into the Enriched Semantic Network (ESN), a multiple subsumption structure with a directed acyclic graph (DAG) IS-A hierarchy, allowing a semantic type to have multiple parents. New viable IS-A links were added as warranted. Two methodologies were presented to identify and add new viable IS-A links. The ESN serves as an extended high-level abstract of the META. (2) The ESN\u27s semantic relationship distribution and concept configuration were studied. Rules were defined to derive the ESN\u27s semantic relationship distribution from the current SN\u27s semantic relationship distribution. A mapping function was defined to map the SN\u27s concept configuration to the ESN\u27s concept configuration, avoiding redundant classifications in the ESN\u27s concept configuration. (3) Several new metaschemas for the SN and the ESN were built and evaluated based on several different partitioning techniques. Each of these metaschema can serve as a higher-level abstraction of the SN (or the ESN)

    Modeling controlled vocabularies using OODBs and multilevel area diagrams

    Get PDF
    A Controlled Vocabulary (CV) is a software system of domain knowledge that consolidates and unifies the terminology of a large application domain. With a common, centralized CV, costly and time-consuming translations can be eliminated between pairs of organizations and pairs of software systems. Unfortunately, the more knowledge we put into a CV, the harder it is to understand and maintain it. In this dissertation, a comprehensive theoretical methodology for modeling CVs using Object-Oriented Database (OODB) technology is presented. We present two methods for representing a semantic network CV as an equivalent OODB, which we call an Object-Oriented Vocabulary Repository (OOVR). The first method, based on a structural analysis and partitioning of the CV, yields an OODB with a very concise schema, referred to as the OOVR schema. Due to its compact size, the schema can be displayed on one or a few computer screens and serves as an aid for comprehending and maintaining the CV. A program called the Object-Oriented Vocabulary Repository Generator (OOVR Generator) has been built to automatically generate an OOVR for a given semantic network CV. Our second methodology results in a larger schema, which, however, serves as an important tool for browsing and navigation through a CV. The OODB schemas created by both methodologies provide important abstract views of CVs. We have also defined a new type of semantic relationships called IS-A\u27 in the context of an OOVR representation. The IS-A\u27 relationships are defined on OOVR schemas to reflect certain important IS-A relationships in the underlying CV. The two OOVR representations exhibit several interesting theoretical characteristics which are formally proven in this dissertation. To provide an environment with several abstract views of a CV, we also define a paradigm called Multilevel Area Diagrams (MLADs). A MLAD is a collection of different partitions of increasing detail and decreasing abstraction derived from a CV. Users can browse at one level and then switch to another level to continue their navigation. Examples of browsing sessions are presented to show that the MLAD paradigm provides processing capabilities beyond those of a traditional object-oriented representation of a vocabulary

    Using structural and semantic methodologies to enhance biomedical terminologies

    Get PDF
    Biomedical terminologies and ontologies underlie various Health Information Systems (HISs), Electronic Health Record (EHR) Systems, Health Information Exchanges (HIEs) and health administrative systems. Moreover, the proliferation of interdisciplinary research efforts in the biomedical field is fueling the need to overcome terminological barriers when integrating knowledge from different fields into a unified research project. Therefore well-developed and well-maintained terminologies are in high demand. Most of the biomedical terminologies are large and complex, which makes it impossible for human experts to manually detect and correct all errors and inconsistencies. Automated and semi-automated Quality Assurance methodologies that focus on areas that are more likely to contain errors and inconsistencies are therefore important. In this dissertation, structural and semantic methodologies are used to enhance biomedical terminologies. The dissertation work is divided into three major parts. The first part consists of structural auditing techniques for the Semantic Network of the Unified Medical Language System (UMLS), which serves as a vocabulary knowledge base for biomedical research in various applications. Research techniques are presented on how to automatically identify and prevent erroneous semantic type assignments to concepts. The Web-based adviseEditor system is introduced to help UMLS editors to make correct multiple semantic type assignments to concepts. It is made available to the National Library of Medicine for future use in maintaining the UMLS. The second part of this dissertation is on how to enhance the conceptual content of SNOMED CT by methods of semantic harmonization. By 2015, SNOMED will become the standard terminology for EH R encoding of diagnoses and problem lists. In order to enrich the semantics and coverage of SNOMED CT for clinical and research applications, the problem of semantic harmonization between SNOMED CT and six reference terminologies is approached by 1) comparing the vertical density of SNOM ED CT with the reference terminologies to find potential concepts for export and import; and 2) categorizing the relationships between structurally congruent concepts from pairs of terminologies, with SNOMED CT being one terminology in the pair. Six kinds of configurations are observed, e.g., alternative classifications, and suggested synonyms. For each configuration, a corresponding solution is presented for enhancing one or both of the terminologies. The third part applies Quality Assurance techniques based on “Abstraction Networks” to biomedical ontologies in BioPortal. The National Center for Biomedical Ontology provides B ioPortal as a repository of over 350 biomedical ontologies covering a wide range of domains. It is extremely difficult to design a new Quality Assurance methodology for each ontology in BioPortal. Fortunately, groups of ontologies in BioPortal share common structural features. Thus, they can be grouped into families based on combinations of these features. A uniform Quality Assurance methodology design for each family will achieve improved efficiency, which is critical with the limited Quality Assurance resources available to most ontology curators. In this dissertation, a family-based framework covering 186 BioPortal ontologies and accompanying Quality Assurance methods based on abstraction networks are presented to tackle this problem

    An expert study evaluating the UMLS lexical metaschema

    No full text
    Objective: A metaschema is an abstraction network of the UMLS\u27s semantic network (SN) obtained from a connected partition of its collection of semantic types. A lexical metaschema was previously derived based on a lexical partition which partitioned the SN into semantic-type groups using identical word-usage among the names of semantic types and the definitions of their respective children. In this paper, a statistical analysis methodology is presented to evaluate the lexical metaschema based on a study involving a group of established UMLS experts. Methods: In the study, each expert was asked to identify subject areas of the SN based on his or her understanding of the various semantic types. For this purpose, the expert scans the SN hierarchy top-down, identifying semantic types, which are important and different enough from their parent semantic types, as roots of their groups. From the response of each expert, an expert metaschema is constructed. The different experts\u27 metaschemas can vary widely. So, additional metaschemas are obtained from aggregations of the experts\u27 responses. Of special interest is the consensus metaschema which represents an aggregation of a simple majority of the experts\u27 responses. Statistical analysis comparing the lexical metaschema with the experts\u27 metaschemas and the consensus metaschema is presented. Results: The analysis results shows that 17 out of the 21 meta-semantic types in the lexical metaschema also appear in the consensus metaschema (about 81%). There are 107 semantic types (about 79%) covered by identical meta-semantic types and refinements. The results show the high similarity between the two metaschemas. Furthermore, the statistical analysis shows that the lexical metaschema did not grossly underperform compared to the experts. Conclusion: Our study shows that the lexical metaschema provides a good approximation for a partition of meaningful subject areas in the SN, when compared to the consensus metaschema capturing the aggregation of a simple majority of the human experts\u27 opinions. © 2005 Elsevier B.V. All rights reserved
    corecore