41 research outputs found

    Structural indicators for effective quality assurance of snomed ct

    Get PDF
    The Standardized Nomenclature of Medicine -- Clinical Terms (SNOMED CT -- further abbreviated as SCT) has been endorsed as a premier clinical terminology by many national and international organizations. The US Government has chosen SCT to play a significant role in its initiative to promote Electronic Health Record (EH R) country-wide. However, there is evidence suggesting that, at the moment, SCT is not optimally modeled for its intended use by healthcare practitioners. There is a need to perform quality assurance (QA) of SCT to help expedite its use as a reference terminology for clinical purposes as planned for EH R use. The central theme of this dissertation is to define a group-based auditing methodology to effectively identify concepts of SCT that require QA. As such, similarity sets are introduced which are groups of concepts that are lexically identical except for one word. Concepts in a similarity set are expected to be modeled in a consistent way. If not, the set is considered to be inconsistent and submitted for review by an auditor. Initial studies found 38% of such sets to be inconsistent. The effectiveness of these sets is further improved through the use of three structural indicators. Using such indicators as the number of parents, relationships and role groups, up to 70% of the similarity sets and 32.6% of the concepts are found to exhibit inconsistencies. Furthermore, positional similarity sets, which are similarity sets with the same position of the differing word in the concept’s terms, are introduced to improve the likelihood of finding errors at the concept level. This strictness in the position of the differing word increases the lexical similarity between the concepts of a set thereby increasing the contrast between lexical similarities and modeling differences. This increase in contrast increases the likelihood of finding inconsistencies. The effectiveness of positional similarity sets in finding inconsistencies is further improved by using the same three structural indicators as discussed above in the generation of these sets. An analysis of 50 sample sets with differences in the number of relationships reveal 41.6% of the concepts to be inconsistent. Moreover, a study is performed to fully automate the process of suggesting attributes to enhance the modeling of SCT concepts using positional similarity sets. A technique is also used to automatically suggest the corresponding target values. An analysis of 50 sample concepts show that, of the 103 suggested attributes, 67 are manually confirmed to be correct. Finally, a study is conducted to examine the readiness of SCT problem list (PL) to support meaningful use of EHR. The results show that the concepts in PL suffer from the same issues as general SCT concepts, although to a slightly lesser extent, and do require further QA efforts. To support such efforts, structural indicators in the form of the number of parents and the number of words are shown to be effective in ferreting out potentially problematic concepts in which QA efforts should be focused. A structural indicator to find concepts with synonymy problems is also presented by finding pairs of SCT concepts that map to the same UMLS concept

    Auditing SNOMED CT Hierarchical Relations Based on Lexical Features of Concepts in Non-Lattice Subgraphs

    Get PDF
    Objective—We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. Methods—Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT’s IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation. Results—A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the “Clinical Finding” and “Procedure” sub-hierarchies. Two domain experts confirmed 185 (among 223) missing IS-A relations, a precision of 82.96%. Conclusions—Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT

    STRUCTURAL AND LEXICAL METHODS FOR AUDITING BIOMEDICAL TERMINOLOGIES

    Get PDF
    Biomedical terminologies serve as knowledge sources for a wide variety of biomedical applications including information extraction and retrieval, data integration and management, and decision support. Quality issues of biomedical terminologies, if not addressed, could affect all downstream applications that use them as knowledge sources. Therefore, Terminology Quality Assurance (TQA) has become an integral part of the terminology management lifecycle. However, identification of potential quality issues is challenging due to the ever-growing size and complexity of biomedical terminologies. It is time-consuming and labor-intensive to manually audit them and hence, automated TQA methods are highly desirable. In this dissertation, systematic and scalable methods to audit biomedical terminologies utilizing their structural as well as lexical information are proposed. Two inference-based methods, two non-lattice-based methods and a deep learning-based method are developed to identify potentially missing hierarchical (or is-a) relations, erroneous is-a relations, and missing concepts in biomedical terminologies including the Gene Ontology (GO), the National Cancer Institute thesaurus (NCIt), and SNOMED CT. In the first inference-based method, the GO concept names are represented using set-of-words model and sequence-of-words model, respectively. Inconsistencies derived between hierarchical linked and unlinked concept pairs are leveraged to detect potentially missing or erroneous is-a relations. The set-of-words model detects a total of 5,359 potential inconsistencies in the 03/28/2017 release of GO and the sequence-of-words model detects 4,959. Domain experts’ evaluation shows that the set-of-words model achieves a precision of 53.78% (128 out of 238) and the sequence-of-words model achieves a precision of 57.55% (122 out of 212) in identifying inconsistencies. In the second inference-based method, a Subsumption-based Sub-term Inference Framework (SSIF) is developed by introducing a novel term-algebra on top of a sequence-based representation of GO concepts. The sequence-based representation utilizes the part of speech of concept names, sub-concepts (concept names appearing inside another concept name), and antonyms appearing in concept names. Three conditional rules (monotonicity, intersection, and sub-concept rules) are developed for backward subsumption inference. Applying SSIF to the 10/03/2018 release of GO suggests 1,938 potentially missing is-a relations. Domain experts’ evaluation of randomly selected 210 potentially missing is-a relations shows that SSIF achieves a precision of 60.61%, 60.49%, and 46.03% for the monotonicity, intersection, and sub-concept rules, respectively. In the first non-lattice-based method, lexical patterns of concepts in Non-Lattice Subgraphs (NLSs: graph fragments with a higher tendency to contain quality issues), are mined to detect potentially missing is-a relations and missing concepts in NCIt. Six lexical patterns: containment, union, intersection, union-intersection, inference-contradiction, and inference-union are leveraged. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. This method identifies 809 NLSs exhibiting these patterns in the 16.12d version of NCIt, achieving a precision of 66% (33 out of 50). In the second non-lattice-based method, enriched lexical attributes from concept ancestors are leveraged to identify potentially missing is-a relations in NLSs. The lexical attributes of a concept are inherited in two ways: from ancestors within the NLS, and from all the ancestors. For a pair of concepts without a hierarchical relation, if the lexical attributes of one concept is a subset of that of the other, a potentially missing is-a relation between the two concepts is suggested. This method identifies a total of 1,022 potentially missing is-a relations in the 19.01d release of NCIt with a precision of 84.44% (76 out of 90) for inheriting lexical attributes from ancestors within the NLS and 89.02% (73 out of 82) for inheriting from all the ancestors. For the non-lattice-based methods, similar NLSs may contain similar quality issues, and thus exhaustive examination of NLSs would involve redundant work. A hybrid method is introduced to identify similar NLSs to avoid redundant analyses. Given an input NLS, a graph isomorphism algorithm is used to obtain its structurally identical NLSs. A similarity score between the input NLS and each of its structurally identical NLSs is computed based on semantic similarity between their corresponding concept names. To compute the similarity between concept names, the concept names are converted to vectors using the Doc2Vec document embedding model and then the cosine similarity of the two vectors is computed. All the structurally identical NLSs with a similarity score above 0.85 is considered to be similar to the input NLS. Applying this method to 10 different structures of NLSs in the 02/12/2018 release of GO reveals that 38.43% of these NLSs have at least one similar NLS. Finally, a deep learning-based method is explored to facilitate the suggestion of missing is-a relations in NCIt and SNOMED CT. Concept pairs exhibiting a containment pattern is the focus here. The problem is framed as a binary classification task, where given a pair of concepts, the deep learning model learns to predict whether the two concepts have an is-a relation or not. Positive training samples are existing is-a relations in the terminology exhibiting containment pattern. Negative training samples are concept-pairs without is-a relations that are also exhibiting containment pattern. A graph neural network model is constructed for this task and trained with subgraphs generated enclosing the pairs of concepts in the samples. To evaluate each model trained by the two terminologies, two evaluation sets are created considering newer releases of each terminology as a partial reference standard. The model trained on NCIt achieves a precision of 0.5, a recall of 0.75, and an F1 score of 0.6. The model trained on SNOMED CT achieves a precision of 0.51, a recall of 0.64 and an F1 score of 0.56

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe

    DETAILED CLINICAL MODELS AND THEIR RELATION WITH ELECTRONIC HEALTH RECORDS

    Full text link
    Tesis por compendio[EN] Healthcare domain produces and consumes big quantities of people's health data. Although data exchange is the norm rather than the exception, being able to access to all patient data is still far from achieved. Current developments such as personal health records will introduce even more data and complexity to the Electronic Health Records (EHR). Achieving semantic interoperability is one of the biggest challenges to overcome in order to benefit from all the information contained in the distributed EHR. This requires that the semantics of the information can be understood by all involved parties. It has been stablished that three layers are needed to achieve semantic interoperability: Reference models, clinical models (archetypes), and clinical terminologies. As seen in the literature, information models (reference models and clinical models) are lacking methodologies and tools to improve EHR systems and to develop new systems that can be semantically interoperable. The purpose of this thesis is to provide methodologies and tools for advancing the use of archetypes in three different scenarios: - Archetype definition over specifications with no dual model architecture native support. Any EHR architecture that directly or indirectly has the notion of detailed clinical models (such as HL7 CDA templates) can be potentially used as a reference model for archetype definition. This allows transforming single-model architectures (which contain only a reference model) into dual-model architectures (reference model with archetypes). A set of methodologies and tools has been developed to support the definition of archetypes from multiple reference models. - Data transformation. A complete methodology and tools are proposed to deal with the transformation of legacy data into XML documents compliant with the archetype and the underlying reference model. If the reference model is a standard then the transformation is a standardization process. The methodologies and tools allow both the transformation of legacy data and the transformation of data between different EHR standards. - Automatic generation of implementation guides and reference materials from archetypes. A methodology for the automatic generation of a set of reference materials is provided. These materials are useful for the development and use of EHR systems. These reference materials include data validators, example instances, implementation guides, human-readable formal rules, sample forms, mindmaps, etc. These reference materials can be combined and organized in different ways to adapt to different types of users (clinical or information technology staff). This way, users can include the detailed clinical model in their organization workflow and cooperate in the model definition. These methodologies and tools put clinical models as a key part of the system. The set of presented methodologies and tools ease the achievement of semantic interoperability by providing means for the semantic description, normalization, and validation of existing and new systems.[ES] El sector sanitario produce y consume una gran cantidad de datos sobre la salud de las personas. La necesidad de intercambiar esta información es una norma más que una excepción, aunque este objetivo está lejos de ser alcanzado. Actualmente estamos viviendo avances como la medicina personalizada que incrementarán aún más el tamaño y complejidad de la Historia Clínica Electrónica (HCE). La consecución de altos grados de interoperabilidad semántica es uno de los principales retos para aprovechar al máximo toda la información contenida en las HCEs. Esto a su vez requiere una representación fiel de la información de tal forma que asegure la consistencia de su significado entre todos los agentes involucrados. Actualmente está reconocido que para la representación del significado clínico necesitamos tres tipos de artefactos: modelos de referencia, modelos clínicos (arquetipos) y terminologías. En el caso concreto de los modelos de información (modelos de referencia y modelos clínicos) se observa en la literatura una falta de metodologías y herramientas que faciliten su uso tanto para la mejora de sistemas de HCE ya existentes como en el desarrollo de nuevos sistemas con altos niveles de interoperabilidad semántica. Esta tesis tiene como propósito proporcionar metodologías y herramientas para el uso avanzado de arquetipos en tres escenarios diferentes: - Definición de arquetipos sobre especificaciones sin soporte nativo al modelo dual. Cualquier arquitectura de HCE que posea directa o indirectamente la noción de modelos clínicos detallados (por ejemplo, las plantillas en HL7 CDA) puede ser potencialmente usada como modelo de referencia para la definición de arquetipos. Con esto se consigue transformar arquitecturas de HCE de modelo único (solo con modelo de referencia) en arquitecturas de doble modelo (modelo de referencia + arquetipos). Se han desarrollado metodologías y herramientas que faciliten a los editores de arquetipos el soporte a múltiples modelos de referencia. - Transformación de datos. Se propone una metodología y herramientas para la transformación de datos ya existentes a documentos XML conformes con los arquetipos y el modelo de referencia subyacente. Si el modelo de referencia es un estándar entonces la transformación será un proceso de estandarización de datos. La metodología y herramientas permiten tanto la transformación de datos no estandarizados como la transformación de datos entre diferentes estándares. - Generación automática de guías de implementación y artefactos procesables a partir de arquetipos. Se aporta una metodología para la generación automática de un conjunto de materiales de referencia de utilidad en el desarrollo y uso de sistemas de HCE, concretamente validadores de datos, instancias de ejemplo, guías de implementación , reglas formales legibles por humanos, formularios de ejemplo, mindmaps, etc. Estos materiales pueden ser combinados y organizados de diferentes modos para facilitar que los diferentes tipos de usuarios (clínicos, técnicos) puedan incluir los modelos clínicos detallados en el flujo de trabajo de su sistema y colaborar en su definición. Estas metodologías y herramientas ponen los modelos clínicos como una parte clave en el sistema. El conjunto de las metodologías y herramientas presentadas facilitan la consecución de la interoperabilidad semántica al proveer medios para la descripción semántica, normalización y validación tanto de sistemas nuevos como ya existentes.[CA] El sector sanitari produeix i consumeix una gran quantitat de dades sobre la salut de les persones. La necessitat d'intercanviar aquesta informació és una norma més que una excepció, encara que aquest objectiu està lluny de ser aconseguit. Actualment estem vivint avanços com la medicina personalitzada que incrementaran encara més la grandària i complexitat de la Història Clínica Electrònica (HCE). La consecució d'alts graus d'interoperabilitat semàntica és un dels principals reptes per a aprofitar al màxim tota la informació continguda en les HCEs. Açò, per la seua banda, requereix una representació fidel de la informació de tal forma que assegure la consistència del seu significat entre tots els agents involucrats. Actualment està reconegut que per a la representació del significat clínic necessitem tres tipus d'artefactes: models de referència, models clínics (arquetips) i terminologies. En el cas concret dels models d'informació (models de referència i models clínics) s'observa en la literatura una mancança de metodologies i eines que en faciliten l'ús tant per a la millora de sistemes de HCE ja existents com per al desenvolupament de nous sistemes amb alts nivells d'interoperabilitat semàntica. Aquesta tesi té com a propòsit proporcionar metodologies i eines per a l'ús avançat d'arquetips en tres escenaris diferents: - Definició d'arquetips sobre especificacions sense suport natiu al model dual. Qualsevol arquitectura de HCE que posseïsca directa o indirectament la noció de models clínics detallats (per exemple, les plantilles en HL7 CDA) pot ser potencialment usada com a model de referència per a la definició d'arquetips. Amb açò s'aconsegueix transformar arquitectures de HCE de model únic (solament amb model de referència) en arquitectures de doble model (model de referència + arquetips). S'han desenvolupat metodologies i eines que faciliten als editors d'arquetips el suport a múltiples models de referència. - Transformació de dades. Es proposa una metodologia i eines per a la transformació de dades ja existents a documents XML conformes amb els arquetips i el model de referència subjacent. Si el model de referència és un estàndard llavors la transformació serà un procés d'estandardització de dades. La metodologia i eines permeten tant la transformació de dades no estandarditzades com la transformació de dades entre diferents estàndards. - Generació automàtica de guies d'implementació i artefactes processables a partir d'arquetips. S'hi inclou una metodologia per a la generació automàtica d'un conjunt de materials de referència d'utilitat en el desenvolupament i ús de sistemes de HCE, concretament validadors de dades, instàncies d'exemple, guies d'implementació, regles formals llegibles per humans, formularis d'exemple, mapes mentals, etc. Aquests materials poden ser combinats i organitzats de diferents maneres per a facilitar que els diferents tipus d'usuaris (clínics, tècnics) puguen incloure els models clínics detallats en el flux de treball del seu sistema i col·laborar en la seua definició. Aquestes metodologies i eines posen els models clínics com una part clau del sistemes. El conjunt de les metodologies i eines presentades faciliten la consecució de la interoperabilitat semàntica en proveir mitjans per a la seua descripció semàntica, normalització i validació tant de sistemes nous com ja existents.Boscá Tomás, D. (2016). DETAILED CLINICAL MODELS AND THEIR RELATION WITH ELECTRONIC HEALTH RECORDS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62174TESISCompendi

    A framework for analyzing changes in health care lexicons and nomenclatures

    Get PDF
    Ontologies play a crucial role in current web-based biomedical applications for capturing contextual knowledge in the domain of life sciences. Many of the so-called bio-ontologies and controlled vocabularies are known to be seriously defective from both terminological and ontological perspectives, and do not sufficiently comply with the standards to be considered formai ontologies. Therefore, they are continuously evolving in order to fix the problems and provide valid knowledge. Moreover, many problems in ontology evolution often originate from incomplete knowledge about the given domain. As our knowledge improves, the related definitions in the ontologies will be altered. This problem is inadequately addressed by available tools and algorithms, mostly due to the lack of suitable knowledge representation formalisms to deal with temporal abstract notations, and the overreliance on human factors. Also most of the current approaches have been focused on changes within the internal structure of ontologies, and interactions with other existing ontologies have been widely neglected. In this research, alter revealing and classifying some of the common alterations in a number of popular biomedical ontologies, we present a novel agent-based framework, RLR (Represent, Legitimate, and Reproduce), to semi-automatically manage the evolution of bio-ontologies, with emphasis on the FungalWeb Ontology, with minimal human intervention. RLR assists and guides ontology engineers through the change management process in general, and aids in tracking and representing the changes, particularly through the use of category theory. Category theory has been used as a mathematical vehicle for modeling changes in ontologies and representing agents' interactions, independent of any specific choice of ontology language or particular implementation. We have also employed rule-based hierarchical graph transformation techniques to propose a more specific semantics for analyzing ontological changes and transformations between different versions of an ontology, as well as tracking the effects of a change in different levels of abstractions. Thus, the RLR framework enables one to manage changes in ontologies, not as standalone artifacts in isolation, but in contact with other ontologies in an openly distributed semantic web environment. The emphasis upon the generality and abstractness makes RLR more feasible in the multi-disciplinary domain of biomedical Ontology change management

    Ontology and text mining: methods and applications for hypertrophic cardiomyopathy and beyond

    Get PDF
    In this thesis we describe a number of contributions across the deeply interlinked domains of ontology, text mining, and prognostic modelling. We explore and evaluate ontology interoperability, and develop new methods for synonym expansion and negation detection in biomedical text. In addition to evaluating these pieces of work individually, we use them to form the basis of a text mining pipeline that can identify and phenotype patients across a clinical text record, which is used to reveal hundreds of University Hospitals Birmingham patients diagnosed with hypertrophic cardiomyopathy who are unknown to the specialist clinic. The work culminates in the text mining results being used to enable prognostic modelling of complication development in patients with hypertrophic cardiomyopathy, finding that routine blood markers, in addition to already well known variables, are powerful predictors
    corecore