396 research outputs found
Infectious Disease Ontology
Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain
Disease Ontology: a backbone for disease semantic integration
The Disease Ontology (DO) database (http://disease-ontology.org) represents a comprehensive knowledge base of 8043 inherited, developmental and acquired human diseases (DO version 3, revision 2510). The DO web browser has been designed for speed, efficiency and robustness through the use of a graph database. Full-text contextual searching functionality using Lucene allows the querying of name, synonym, definition, DOID and cross-reference (xrefs) with complex Boolean search strings. The DO semantically integrates disease and medical vocabularies through extensive cross mapping and integration of MeSH, ICD, NCI's thesaurus, SNOMED CT and OMIM disease-specific terms and identifiers. The DO is utilized for disease annotation by major biomedical databases (e.g. Array Express, NIF, IEDB), as a standard representation of human disease in biomedical ontologies (e.g. IDO, Cell line ontology, NIFSTD ontology, Experimental Factor Ontology, Influenza Ontology), and as an ontological cross mappings resource between DO, MeSH and OMIM (e.g. GeneWiki). The DO project (http://diseaseontology.sf.net) has been incorporated into open source tools (e.g. Gene Answers, FunDO) to connect gene and disease biomedical data through the lens of human disease. The next iteration of the DO web browser will integrate DO's extended relations and logical definition representation along with these biomedical resource cross-mappings
Analyzing historical diagnosis code data from NIH N3C and RECOVER Programs using deep learning to determine risk factors for Long Covid
Post-acute sequelae of SARS-CoV-2 infection (PASC) or Long COVID is an
emerging medical condition that has been observed in several patients with a
positive diagnosis for COVID-19. Historical Electronic Health Records (EHR)
like diagnosis codes, lab results and clinical notes have been analyzed using
deep learning and have been used to predict future clinical events. In this
paper, we propose an interpretable deep learning approach to analyze historical
diagnosis code data from the National COVID Cohort Collective (N3C) to find the
risk factors contributing to developing Long COVID. Using our deep learning
approach, we are able to predict if a patient is suffering from Long COVID from
a temporally ordered list of diagnosis codes up to 45 days post the first COVID
positive test or diagnosis for each patient, with an accuracy of 70.48\%. We
are then able to examine the trained model using Gradient-weighted Class
Activation Mapping (GradCAM) to give each input diagnoses a score. The highest
scored diagnosis were deemed to be the most important for making the correct
prediction for a patient. We also propose a way to summarize these top
diagnoses for each patient in our cohort and look at their temporal trends to
determine which codes contribute towards a positive Long COVID diagnosis
Impact of Terminology Mapping on Population Health Cohorts IMPaCt
Background and Objectives: The population health care delivery model uses phenotype algorithms in the electronic health record (EHR) system to identify patient cohorts targeted for clinical interventions such as laboratory tests, and procedures. The standard terminology used to identify disease cohorts may contribute to significant variation in error rates for patient inclusion or exclusion. The United States requires EHR systems to support two diagnosis terminologies, the International Classification of Disease (ICD) and the Systematized Nomenclature of Medicine (SNOMED). Terminology mapping enables the retrieval of diagnosis data using either terminology. There are no standards of practice by which to evaluate and report the operational characteristics of ICD and SNOMED value sets used to select patient groups for population health interventions. Establishing a best practice for terminology selection is a step forward in ensuring that the right patients receive the right intervention at the right time. The research question is, “How does the diagnosis retrieval terminology (ICD vs SNOMED) and terminology map maintenance impact population health cohorts?” Aim 1 and 2 explore this question, and Aim 3 informs practice and policy for population health programs.
Methods
Aim 1: Quantify impact of terminology choice (ICD vs SNOMED)
ICD and SNOMED phenotype algorithms for diabetes, chronic kidney disease (CKD), and heart failure were developed using matched sets of codes from the Value Set Authority Center. The performance of the diagnosis-only phenotypes was compared to published reference standard that included diagnosis codes, laboratory results, procedures, and medications.
Aim 2: Measure terminology maintenance impact on SNOMED cohorts
For each disease state, the performance of a single SNOMED algorithm before and after terminology updates was evaluated in comparison to a reference standard to identify and quantify cohort changes introduced by terminology maintenance.
Aim 3: Recommend methods for improving population health interventions
The socio-technical model for studying health information technology was used to inform best practice for the use of population health interventions.
Results
Aim 1: ICD-10 value sets had better sensitivity than SNOMED for diabetes (.829, .662) and CKD (.242, .225) (N=201,713, p
Aim 2: Following terminology maintenance the SNOMED algorithm for diabetes increased in sensitivity from (.662 to .683 (p
Aim 3: Based on observed social and technical challenges to population health programs, including and in addition to the development and measurement of phenotypes, a practical method was proposed for population health intervention development and reporting
Deep Risk Prediction and Embedding of Patient Data: Application to Acute Gastrointestinal Bleeding
Acute gastrointestinal bleeding is a common and costly condition, accounting for over 2.2 million hospital days and 19.2 billion dollars of medical charges annually. Risk stratification is a critical part of initial assessment of patients with acute gastrointestinal bleeding. Although all national and international guidelines recommend the use of risk-assessment scoring systems, they are not commonly used in practice, have sub-optimal performance, may be applied incorrectly, and are not easily updated. With the advent of widespread electronic health record adoption, longitudinal clinical data captured during the clinical encounter is now available. However, this data is often noisy, sparse, and heterogeneous. Unsupervised machine learning algorithms may be able to identify structure within electronic health record data while accounting for key issues with the data generation process: measurements missing-not-at-random and information captured in unstructured clinical note text. Deep learning tools can create electronic health record-based models that perform better than clinical risk scores for gastrointestinal bleeding and are well-suited for learning from new data. Furthermore, these models can be used to predict risk trajectories over time, leveraging the longitudinal nature of the electronic health record. The foundation of creating relevant tools is the definition of a relevant outcome measure; in acute gastrointestinal bleeding, a composite outcome of red blood cell transfusion, hemostatic intervention, and all-cause 30-day mortality is a relevant, actionable outcome that reflects the need for hospital-based intervention. However, epidemiological trends may affect the relevance and effectiveness of the outcome measure when applied across multiple settings and patient populations. Understanding the trends in practice, potential areas of disparities, and value proposition for using risk stratification in patients presenting to the Emergency Department with acute gastrointestinal bleeding is important in understanding how to best implement a robust, generalizable risk stratification tool. Key findings include a decrease in the rate of red blood cell transfusion since 2014 and disparities in access to upper endoscopy for patients with upper gastrointestinal bleeding by race/ethnicity across urban and rural hospitals. Projected accumulated savings of consistent implementation of risk stratification tools for upper gastrointestinal bleeding total approximately $1 billion 5 years after implementation. Most current risk scores were designed for use based on the location of the bleeding source: upper or lower gastrointestinal tract. However, the location of the bleeding source is not always clear at presentation. I develop and validate electronic health record based deep learning and machine learning tools for patients presenting with symptoms of acute gastrointestinal bleeding (e.g., hematemesis, melena, hematochezia), which is more relevant and useful in clinical practice. I show that they outperform leading clinical risk scores for upper and lower gastrointestinal bleeding, the Glasgow Blatchford Score and the Oakland score. While the best performing gradient boosted decision tree model has equivalent overall performance to the fully connected feedforward neural network model, at the very low risk threshold of 99% sensitivity the deep learning model identifies more very low risk patients. Using another deep learning model that can model longitudinal risk, the long-short-term memory recurrent neural network, need for transfusion of red blood cells can be predicted at every 4-hour interval in the first 24 hours of intensive care unit stay for high risk patients with acute gastrointestinal bleeding. Finally, for implementation it is important to find patients with symptoms of acute gastrointestinal bleeding in real time and characterize patients by risk using available data in the electronic health record. A decision rule-based electronic health record phenotype has equivalent performance as measured by positive predictive value compared to deep learning and natural language processing-based models, and after live implementation appears to have increased the use of the Acute Gastrointestinal Bleeding Clinical Care pathway. Patients with acute gastrointestinal bleeding but with other groups of disease concepts can be differentiated by directly mapping unstructured clinical text to a common ontology and treating the vector of concepts as signals on a knowledge graph; these patients can be differentiated using unbalanced diffusion earth mover’s distances on the graph. For electronic health record data with data missing not at random, MURAL, an unsupervised random forest-based method, handles data with missing values and generates visualizations that characterize patients with gastrointestinal bleeding. This thesis forms a basis for understanding the potential for machine learning and deep learning tools to characterize risk for patients with acute gastrointestinal bleeding. In the future, these tools may be critical in implementing integrated risk assessment to keep low risk patients out of the hospital and guide resuscitation and timely endoscopic procedures for patients at higher risk for clinical decompensation
Recommended from our members
Decreases in Antimicrobial Use Associated With Multihospital Implementation of Electronic Antimicrobial Stewardship Tools.
BackgroundAntimicrobial stewards may benefit from comparative data to inform interventions that promote optimal inpatient antimicrobial use.MethodsAntimicrobial stewards from 8 geographically dispersed Veterans Affairs (VA) inpatient facilities participated in the development of antimicrobial use visualization tools that allowed for comparison to facilities of similar complexity. The visualization tools consisted of an interactive web-based antimicrobial dashboard and, later, a standardized antimicrobial usage report updated at user-selected intervals. Stewards participated in monthly learning collaboratives. The percent change in average monthly antimicrobial use (all antimicrobial agents, anti-methicillin-resistant Staphylococcus aureus [anti-MRSA] agents, and antipseudomonal agents) was analyzed using a pre-post (January 2014-January 2016 vs July 2016-January 2018) design with segmented regression and external comparison with uninvolved control facilities (n = 118).ResultsIntervention sites demonstrated a 2.1% decrease (95% confidence interval [CI], -5.7% to 1.6%) in total antimicrobial use pre-post intervention vs a 2.5% increase (95% CI, 0.8% to 4.1%) in nonintervention sites (absolute difference, 4.6%; P = .025). Anti-MRSA antimicrobial use decreased 11.3% (95% CI, -16.0% to -6.3%) at intervention sites vs a 6.6% decrease (95% CI, -9.1% to -3.9%) at nonintervention sites (absolute difference, 4.7%; P = .092). Antipseudomonal antimicrobial use decreased 3.4% (95% CI, -8.2% to 1.7%) at intervention sites vs a 3.6% increase (95% CI, 0.8% to 6.5%) at nonintervention sites (absolute difference, 7.0%; P = .018).ConclusionsComparative data visualization tool use by stewards at 8 VA facilities was associated with significant reductions in overall antimicrobial and antipseudomonal use relative to uninvolved facilities
Discovering disease-disease associations using electronic health records in The Guideline Advantage (TGA) dataset
Certain diseases have strong comorbidity and co-occurrence with others. Understanding disease-disease associations can potentially increase awareness among healthcare providers of co-occurring conditions and facilitate earlier diagnosis, prevention and treatment of patients. In this study, we utilized the valuable and large The Guideline Advantage (TGA) longitudinal electronic health record dataset from 70 outpatient clinics across the United States to investigate potential disease-disease associations. Specifically, the most prevalent 50 disease diagnoses were manually identified from 165,732 unique patients. To investigate the co-occurrence or dependency associations among the 50 diseases, the categorical disease terms were first mapped into numerical vectors based on disease co-occurrence frequency in individual patients using the Word2Vec approach. Then the novel and interesting disease association clusters were identified using correlation and clustering analyses in the numerical space. Moreover, the distribution of time delay (Δt) between pair-wise strongly associated diseases (correlation coefficients ≥ 0.5) were calculated to show the dependency among the diseases. The results can indicate the risk of disease comorbidity and complications, and facilitate disease prevention and optimal treatment decision-making
A Learning Health System for Radiation Oncology
The proposed research aims to address the challenges faced by clinical data science researchers in radiation oncology accessing, integrating, and analyzing heterogeneous data from various sources. The research presents a scalable intelligent infrastructure, called the Health Information Gateway and Exchange (HINGE), which captures and structures data from multiple sources into a knowledge base with semantically interlinked entities. This infrastructure enables researchers to mine novel associations and gather relevant knowledge for personalized clinical outcomes.
The dissertation discusses the design framework and implementation of HINGE, which abstracts structured data from treatment planning systems, treatment management systems, and electronic health records. It utilizes disease-specific smart templates for capturing clinical information in a discrete manner. HINGE performs data extraction, aggregation, and quality and outcome assessment functions automatically, connecting seamlessly with local IT/medical infrastructure.
Furthermore, the research presents a knowledge graph-based approach to map radiotherapy data to an ontology-based data repository using FAIR (Findable, Accessible, Interoperable, Reusable) concepts. This approach ensures that the data is easily discoverable and accessible for clinical decision support systems. The dissertation explores the ETL (Extract, Transform, Load) process, data model frameworks, ontologies, and provides a real-world clinical use case for this data mapping.
To improve the efficiency of retrieving information from large clinical datasets, a search engine based on ontology-based keyword searching and synonym-based term matching tool was developed. The hierarchical nature of ontologies is leveraged to retrieve patient records based on parent and children classes. Additionally, patient similarity analysis is conducted using vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) to identify similar patients based on text corpus creation methods. Results from the analysis using these models are presented.
The implementation of a learning health system for predicting radiation pneumonitis following stereotactic body radiotherapy is also discussed. 3D convolutional neural networks (CNNs) are utilized with radiographic and dosimetric datasets to predict the likelihood of radiation pneumonitis. DenseNet-121 and ResNet-50 models are employed for this study, along with integrated gradient techniques to identify salient regions within the input 3D image dataset. The predictive performance of the 3D CNN models is evaluated based on clinical outcomes.
Overall, the proposed Learning Health System provides a comprehensive solution for capturing, integrating, and analyzing heterogeneous data in a knowledge base. It offers researchers the ability to extract valuable insights and associations from diverse sources, ultimately leading to improved clinical outcomes. This work can serve as a model for implementing LHS in other medical specialties, advancing personalized and data-driven medicine
- …