318 research outputs found
COHORT IDENTIFICATION FROM FREE-TEXT CLINICAL NOTES USING SNOMED CT’S SEMANTIC RELATIONS
In this paper, a new cohort identification framework that exploits the semantic hierarchy of SNOMED CT is proposed to overcome the limitations of supervised machine learning-based approaches. Eligibility criteria descriptions and free-text clinical notes from the 2018 National NLP Clinical Challenge (n2c2) were processed to map to relevant SNOMED CT concepts and to measure semantic similarity between the eligibility criteria and patients. The eligibility of a patient was determined if the patient had a similarity score higher than a threshold cut-off value, which was established where the best F1 score could be achieved. The performance of the proposed system was evaluated for three eligibility criteria. The current framework’s macro-average F1 score across three eligibility criteria was higher than the previously reported results of the 2018 n2c2 (0.933 vs. 0.889). This study demonstrated that SNOMED CT alone can be leveraged for cohort identification tasks without referring to external textual sources for training.Doctor of Philosoph
An interactive retrieval system for clinical trial studies with context-dependent protocol elements.
A well-defined protocol for a clinical trial guarantees a successful outcome report. When designing the protocol, most researchers refer to electronic databases and extract protocol elements using a keyword search. However, state-of-the-art database systems only offer text-based searches for user-entered keywords. In this study, we present a database system with a context-dependent and protocol-element-selection function for successfully designing a clinical trial protocol. To do this, we first introduce a database for a protocol retrieval system constructed from individual protocol data extracted from 184,634 clinical trials and 13,210 frame structures of clinical trial protocols. The database contains a variety of semantic information that allows the filtering of protocols during the search operation. Based on the database, we developed a web application called the clinical trial protocol database system (CLIPS; available at https://corus.kaist.edu/clips). This system enables an interactive search by utilizing protocol elements. To enable an interactive search for combinations of protocol elements, CLIPS provides optional next element selection according to the previous element in the form of a connected tree. The validation results show that our method achieves better performance than that of existing databases in predicting phenotypic features
A Learning Health System for Radiation Oncology
The proposed research aims to address the challenges faced by clinical data science researchers in radiation oncology accessing, integrating, and analyzing heterogeneous data from various sources. The research presents a scalable intelligent infrastructure, called the Health Information Gateway and Exchange (HINGE), which captures and structures data from multiple sources into a knowledge base with semantically interlinked entities. This infrastructure enables researchers to mine novel associations and gather relevant knowledge for personalized clinical outcomes.
The dissertation discusses the design framework and implementation of HINGE, which abstracts structured data from treatment planning systems, treatment management systems, and electronic health records. It utilizes disease-specific smart templates for capturing clinical information in a discrete manner. HINGE performs data extraction, aggregation, and quality and outcome assessment functions automatically, connecting seamlessly with local IT/medical infrastructure.
Furthermore, the research presents a knowledge graph-based approach to map radiotherapy data to an ontology-based data repository using FAIR (Findable, Accessible, Interoperable, Reusable) concepts. This approach ensures that the data is easily discoverable and accessible for clinical decision support systems. The dissertation explores the ETL (Extract, Transform, Load) process, data model frameworks, ontologies, and provides a real-world clinical use case for this data mapping.
To improve the efficiency of retrieving information from large clinical datasets, a search engine based on ontology-based keyword searching and synonym-based term matching tool was developed. The hierarchical nature of ontologies is leveraged to retrieve patient records based on parent and children classes. Additionally, patient similarity analysis is conducted using vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) to identify similar patients based on text corpus creation methods. Results from the analysis using these models are presented.
The implementation of a learning health system for predicting radiation pneumonitis following stereotactic body radiotherapy is also discussed. 3D convolutional neural networks (CNNs) are utilized with radiographic and dosimetric datasets to predict the likelihood of radiation pneumonitis. DenseNet-121 and ResNet-50 models are employed for this study, along with integrated gradient techniques to identify salient regions within the input 3D image dataset. The predictive performance of the 3D CNN models is evaluated based on clinical outcomes.
Overall, the proposed Learning Health System provides a comprehensive solution for capturing, integrating, and analyzing heterogeneous data in a knowledge base. It offers researchers the ability to extract valuable insights and associations from diverse sources, ultimately leading to improved clinical outcomes. This work can serve as a model for implementing LHS in other medical specialties, advancing personalized and data-driven medicine
Front-Line Physicians' Satisfaction with Information Systems in Hospitals
Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe
The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside
Background: Translational medicine requires the integration of knowledge using heterogeneous data from health care to the life sciences. Here, we describe a collaborative effort to produce a prototype Translational Medicine Knowledge Base (TMKB) capable of answering questions relating to clinical practice and pharmaceutical drug discovery. Results: We developed the Translational Medicine Ontology (TMO) as a unifying ontology to integrate chemical, genomic and proteomic data with disease, treatment, and electronic health records. We demonstrate the use of Semantic Web technologies in the integration of patient and biomedical data, and reveal how such a knowledge base can aid physicians in providing tailored patient care and facilitate the recruitment of patients into active clinical trials. Thus, patients, physicians and researchers may explore the knowledge base to better understand therapeutic options, efficacy, and mechanisms of action. Conclusions: This work takes an important step in using Semantic Web technologies to facilitate integration of relevant, distributed, external sources and progress towards a computational platform to support personalized medicine. Availability: TMO can be downloaded from http://code.google.com/p/translationalmedicineontology and TMKB can be accessed at http://tm.semanticscience.org/sparql
Deep Risk Prediction and Embedding of Patient Data: Application to Acute Gastrointestinal Bleeding
Acute gastrointestinal bleeding is a common and costly condition, accounting for over 2.2 million hospital days and 19.2 billion dollars of medical charges annually. Risk stratification is a critical part of initial assessment of patients with acute gastrointestinal bleeding. Although all national and international guidelines recommend the use of risk-assessment scoring systems, they are not commonly used in practice, have sub-optimal performance, may be applied incorrectly, and are not easily updated. With the advent of widespread electronic health record adoption, longitudinal clinical data captured during the clinical encounter is now available. However, this data is often noisy, sparse, and heterogeneous. Unsupervised machine learning algorithms may be able to identify structure within electronic health record data while accounting for key issues with the data generation process: measurements missing-not-at-random and information captured in unstructured clinical note text. Deep learning tools can create electronic health record-based models that perform better than clinical risk scores for gastrointestinal bleeding and are well-suited for learning from new data. Furthermore, these models can be used to predict risk trajectories over time, leveraging the longitudinal nature of the electronic health record. The foundation of creating relevant tools is the definition of a relevant outcome measure; in acute gastrointestinal bleeding, a composite outcome of red blood cell transfusion, hemostatic intervention, and all-cause 30-day mortality is a relevant, actionable outcome that reflects the need for hospital-based intervention. However, epidemiological trends may affect the relevance and effectiveness of the outcome measure when applied across multiple settings and patient populations. Understanding the trends in practice, potential areas of disparities, and value proposition for using risk stratification in patients presenting to the Emergency Department with acute gastrointestinal bleeding is important in understanding how to best implement a robust, generalizable risk stratification tool. Key findings include a decrease in the rate of red blood cell transfusion since 2014 and disparities in access to upper endoscopy for patients with upper gastrointestinal bleeding by race/ethnicity across urban and rural hospitals. Projected accumulated savings of consistent implementation of risk stratification tools for upper gastrointestinal bleeding total approximately $1 billion 5 years after implementation. Most current risk scores were designed for use based on the location of the bleeding source: upper or lower gastrointestinal tract. However, the location of the bleeding source is not always clear at presentation. I develop and validate electronic health record based deep learning and machine learning tools for patients presenting with symptoms of acute gastrointestinal bleeding (e.g., hematemesis, melena, hematochezia), which is more relevant and useful in clinical practice. I show that they outperform leading clinical risk scores for upper and lower gastrointestinal bleeding, the Glasgow Blatchford Score and the Oakland score. While the best performing gradient boosted decision tree model has equivalent overall performance to the fully connected feedforward neural network model, at the very low risk threshold of 99% sensitivity the deep learning model identifies more very low risk patients. Using another deep learning model that can model longitudinal risk, the long-short-term memory recurrent neural network, need for transfusion of red blood cells can be predicted at every 4-hour interval in the first 24 hours of intensive care unit stay for high risk patients with acute gastrointestinal bleeding. Finally, for implementation it is important to find patients with symptoms of acute gastrointestinal bleeding in real time and characterize patients by risk using available data in the electronic health record. A decision rule-based electronic health record phenotype has equivalent performance as measured by positive predictive value compared to deep learning and natural language processing-based models, and after live implementation appears to have increased the use of the Acute Gastrointestinal Bleeding Clinical Care pathway. Patients with acute gastrointestinal bleeding but with other groups of disease concepts can be differentiated by directly mapping unstructured clinical text to a common ontology and treating the vector of concepts as signals on a knowledge graph; these patients can be differentiated using unbalanced diffusion earth mover’s distances on the graph. For electronic health record data with data missing not at random, MURAL, an unsupervised random forest-based method, handles data with missing values and generates visualizations that characterize patients with gastrointestinal bleeding. This thesis forms a basis for understanding the potential for machine learning and deep learning tools to characterize risk for patients with acute gastrointestinal bleeding. In the future, these tools may be critical in implementing integrated risk assessment to keep low risk patients out of the hospital and guide resuscitation and timely endoscopic procedures for patients at higher risk for clinical decompensation
Impact of Terminology Mapping on Population Health Cohorts IMPaCt
Background and Objectives: The population health care delivery model uses phenotype algorithms in the electronic health record (EHR) system to identify patient cohorts targeted for clinical interventions such as laboratory tests, and procedures. The standard terminology used to identify disease cohorts may contribute to significant variation in error rates for patient inclusion or exclusion. The United States requires EHR systems to support two diagnosis terminologies, the International Classification of Disease (ICD) and the Systematized Nomenclature of Medicine (SNOMED). Terminology mapping enables the retrieval of diagnosis data using either terminology. There are no standards of practice by which to evaluate and report the operational characteristics of ICD and SNOMED value sets used to select patient groups for population health interventions. Establishing a best practice for terminology selection is a step forward in ensuring that the right patients receive the right intervention at the right time. The research question is, “How does the diagnosis retrieval terminology (ICD vs SNOMED) and terminology map maintenance impact population health cohorts?” Aim 1 and 2 explore this question, and Aim 3 informs practice and policy for population health programs.
Methods
Aim 1: Quantify impact of terminology choice (ICD vs SNOMED)
ICD and SNOMED phenotype algorithms for diabetes, chronic kidney disease (CKD), and heart failure were developed using matched sets of codes from the Value Set Authority Center. The performance of the diagnosis-only phenotypes was compared to published reference standard that included diagnosis codes, laboratory results, procedures, and medications.
Aim 2: Measure terminology maintenance impact on SNOMED cohorts
For each disease state, the performance of a single SNOMED algorithm before and after terminology updates was evaluated in comparison to a reference standard to identify and quantify cohort changes introduced by terminology maintenance.
Aim 3: Recommend methods for improving population health interventions
The socio-technical model for studying health information technology was used to inform best practice for the use of population health interventions.
Results
Aim 1: ICD-10 value sets had better sensitivity than SNOMED for diabetes (.829, .662) and CKD (.242, .225) (N=201,713, p
Aim 2: Following terminology maintenance the SNOMED algorithm for diabetes increased in sensitivity from (.662 to .683 (p
Aim 3: Based on observed social and technical challenges to population health programs, including and in addition to the development and measurement of phenotypes, a practical method was proposed for population health intervention development and reporting
Recommended from our members
Ontology-based Semantic Harmonization of HIV-associated Common Data Elements for Integration of Diverse HIV Research Datasets
Analysis of integrated, diverse, Human Immunodeficiency Virus (HIV)-associated datasets can increase knowledge and guide the development of novel and effective interventions for disease prevention and treatment by increasing breadth of variables and statistical power, particularly for sub-group analyses. This topic has been identified as a National Institutes of Health research priority, but few efforts have been made to integrate data across HIV studies. Our aims were to: 1) Characterize the semantic heterogeneity (SH) in the HIV research domain; 2) Identify HIV-associated common data elements (CDEs) in empirically generated and knowledge-based resources; 3) Create a formal representation of HIV-associated CDEs in the form of an HIV-associated Entities in Research Ontology (HERO); 4) Assess the feasibility of using HERO to semantically harmonize HIV research data. Our approach was guided by information/knowledge theory and the DIKW (Data Information Knowledge Wisdom) hierarchical model.
Our systematized review of the literature revealed that synergistic use of both ontologies and CDEs included integration, interoperability, data exchange, and data standardization. Moreover, methods and tools included use of experts for CDE identification, the Unified Medical Language System, natural language processing, Extensible Markup Language, Health Level 7, and ontology development tools (e.g., Protégé). Additionally, evaluation methods included expert assessment, quantification of mapping tasks between raters, assessment of interrater reliability, and comparison to established standards. We used these findings to inform our process for achieving the study aims.
For Aim 1, we analyzed eight disparate HIV-associated data dictionaries and developed a String Metric-assisted Assessment of Semantic Heterogeneity (SMASH) method, which aided identification of 127 (13%) homogeneous data element (DE) pairs and 1,048 (87%) semantically heterogeneous DE pairs. Most heterogeneous pairs (97%) were semantically-equivalent/syntactically-different, allowing us to determine that SH in the HIV research domain was high.
To achieve Aim 2, we used Clinicaltrials.gov, Google Search, and text mining in R to identify HIV-associated CDEs in HIV journal articles, HIV-associated datasets, AIDSinfo HIV/AIDS Glossary, AIDSinfo Drug Database, Logical Observation Identifiers Names and Codes (LOINC), Systematized Nomenclature of Medicine (SNOMED), and RxNORM (understood as prescription normalization). Two HIV experts then manually reviewed DEs from the journal articles and data dictionaries to confirm DE commonality and resolved semantic discrepancies through discussion. Ultimately, we identified 2,179 unique CDEs. Of all CDEs, data-driven approaches identified 2,055 (94%) (999 from the HIV/AIDS Glossary, 398 from the Drug Database, 91 from journal articles, and a total of 567 from LOINC, SNOMED, and RxNorm cumulatively). Expert-based approaches identified 124 (6%) unique CDEs from data dictionaries and confirmed the 91 CDEs from journal articles.
In Aim 3, we used the Protégé suite of ontology development tools and the 2,179 CDEs to develop the HERO. We modeled the ontology using the semantic structure of the Medical Entities Dictionary, available hierarchical information from the CDE knowledge resources, and expert knowledge. The ontology fulfilled most relevant criteria from Cimino’s desiderata and OntoClean ontology engineering principles, and it successfully answered eight competency questions.
Finally, for Aim 4, we assessed the feasibility of using HERO to semantically harmonize and integrate the data dictionaries from two diverse HIV-associated datasets. Two HIV experts involved in the development of HERO independently assessed each data dictionary. Of the 367 DEs in data dictionary 1 (D1), 181 (49.32%) were identified as CDEs and 186 (50.68%) were not CDEs, and of the 72 DEs in data dictionary 2 (D2), 37 (51.39%) were CDEs and 35 (48.61%) were not CDEs. The HIV experts then traversed HERO’s hierarchy to map CDEs from D1 and D2 to CDEs in HERO. Of the 181 CDEs in D1, 156 (86.19%) were found in HERO, and 25 (13.81%) were not. Similarly, of the 37 CDEs in D2 32 (86.48%) were found in HERO, and 5 (13.51%) were not. Interrater reliability for CDE identification as measured by Cohen’s Kappa was 0.900 for D1 and 0.892 for D2. Cohen’s Kappas for CDEs in D1 and D2 that were also identified in HERO were 0.885 and 0.688, respectively.
Subsequently, to demonstrate the integration of the two HIV-associated datasets, a sample of semantically harmonized CDEs in both datasets was categorically selected (e.g. administrative, demographic, and behavioral), and D2 sample size increases were calculated for race (e.g., White, African American/Black, Asian/Pacific Islander, Native American/Indian, and Hispanic/Latino) and for “intravenous drug use” from the integrated datasets. The average increase of D2 CDEs for six selected CDEs was 1,928%.
Despite the limitation of HERO developers also serving as evaluators, the contributions of the study to the fields of informatics and HIV research were substantial. Confirmatory contributions include: identification of effective CDE/ontology tools, and use of data-driven and expert-based methods. Novel contributions include: development of SMASH and HERO; and new contributions include documenting that SH is high in HIV-associated datasets, identifying 2,179 HIV-associated CDEs, creating two additional classifications of SH, and showing that using HERO for semantic harmonization of HIV-associated data dictionaries is feasible. Our future work will build upon this research by expanding the numbers and types of datasets, refining our methods and tools, and conducting an external evaluation
Mapping of electronic health records in Spanish to the unified medical language system metathesaurus
[EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta
- …