239 research outputs found

    LexOWL: A Bridge from LexGrid to OWL

    Get PDF
    The Lexical Grid project is an on-going community driven initiative that provides a common terminology model to represent multiple vocabulary and ontology sources as well as a scalable and robust API for accessing such information. In order to add more powerful functionalities to the existing infrastructure and align LexGrid more closely with various Semantic Web technologies, we introduce the LexOWL project for representing the ontologies modeled within the LexGrid environment in OWL (Web Ontology Language). The crux of this effort is to create a “bridge” that functionally connects the LexBIG (a LexGrid API) and the OWL API (an interface that implements OWL) seamlessly. In this paper, we discuss the key aspects of designing and implementing the LexOWL bridge. We compared LexOWL with other OWL converting tools and conclude that LexOWL provides an OWL mapping and converting tool with well-defined interoperability for information in the biomedical domain

    The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction [preprint]

    Get PDF
    Background: The majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy. Methods and Findings: In a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR \u3e99% or antigen Conclusions: This is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease

    LexOWL: A Bridge from LexGrid to OWL

    Get PDF
    The Lexical Grid project is an on-going community driven initiative that provides a common terminology model to represent multiple vocabulary and ontology sources as well as a scalable and robust API for accessing such information. In order to add more powerful functionalities to the existing infrastructure and align LexGrid more closely with various Semantic Web technologies, we introduce the LexOWL project for representing the ontologies modeled within the LexGrid environment in OWL (Web Ontology Language). The crux of this effort is to create a “bridge” that functionally connects the LexBIG (a LexGrid API) and the OWL API (an interface that implements OWL) seamlessly. In this paper, we discuss the key aspects of designing and implementing the LexOWL bridge. We compared LexOWL with other OWL converting tools and conclude that LexOWL provides an OWL mapping and converting tool with well-defined interoperability for information in the biomedical domain

    A Genome-Wide Association Study of Red Blood Cell Traits Using the Electronic Medical Record

    Get PDF
    The Electronic Medical Record (EMR) is a potential source for high throughput phenotyping to conduct genome-wide association studies (GWAS), including those of medically relevant quantitative traits. We describe use of the Mayo Clinic EMR to conduct a GWAS of red blood cell (RBC) traits in a cohort of patients with peripheral arterial disease (PAD) and controls without PAD.Results for hemoglobin level, hematocrit, RBC count, mean corpuscular volume, mean corpuscular hemoglobin, and mean corpuscular hemoglobin concentration were extracted from the EMR from January 1994 to September 2009. Out of 35,159 RBC trait values in 3,411 patients, we excluded 12,864 values in 1,165 patients that had been measured during hospitalization or in the setting of hematological disease, malignancy, or use of drugs that affect RBC traits, leaving a final genotyped sample of 3,012, 80% of whom had ≥2 measurements. The median of each RBC trait was used in the genetic analyses, which were conducted using an additive model that adjusted for age, sex, and PAD status. We identified four genomic loci that were associated (P<5 × 10(-8)) with one or more of the RBC traits (HBLS1/MYB on 6q23.3, TMPRSS6 on 22q12.3, HFE on 6p22.1, and SLC17A1 on 6p22.2). Three of these loci (HBLS1/MYB, TMPRSS6, and HFE) had been identified in recent GWAS and the allele frequencies, effect sizes, and the directions of effects of the replicated SNPs were similar to the prior studies.Our results demonstrate feasibility of using the EMR to conduct high throughput genomic studies of medically relevant quantitative traits

    Domain-specific language models and lexicons for tagging

    Get PDF
    AbstractAccurate and reliable part-of-speech tagging is useful for many Natural Language Processing (NLP) tasks that form the foundation of NLP-based approaches to information retrieval and data mining. In general, large annotated corpora are necessary to achieve desired part-of-speech tagger accuracy. We show that a large annotated general-English corpus is not sufficient for building a part-of-speech tagger model adequate for tagging documents from the medical domain. However, adding a quite small domain-specific corpus to a large general-English one boosts performance to over 92% accuracy from 87% in our studies. We also suggest a number of characteristics to quantify the similarities between a training corpus and the test data. These results give guidance for creating an appropriate corpus for building a part-of-speech tagger model that gives satisfactory accuracy results on a new domain at a relatively small cost

    Analyzing historical diagnosis code data from NIH N3C and RECOVER Programs using deep learning to determine risk factors for Long Covid

    Full text link
    Post-acute sequelae of SARS-CoV-2 infection (PASC) or Long COVID is an emerging medical condition that has been observed in several patients with a positive diagnosis for COVID-19. Historical Electronic Health Records (EHR) like diagnosis codes, lab results and clinical notes have been analyzed using deep learning and have been used to predict future clinical events. In this paper, we propose an interpretable deep learning approach to analyze historical diagnosis code data from the National COVID Cohort Collective (N3C) to find the risk factors contributing to developing Long COVID. Using our deep learning approach, we are able to predict if a patient is suffering from Long COVID from a temporally ordered list of diagnosis codes up to 45 days post the first COVID positive test or diagnosis for each patient, with an accuracy of 70.48\%. We are then able to examine the trained model using Gradient-weighted Class Activation Mapping (GradCAM) to give each input diagnoses a score. The highest scored diagnosis were deemed to be the most important for making the correct prediction for a patient. We also propose a way to summarize these top diagnoses for each patient in our cohort and look at their temporal trends to determine which codes contribute towards a positive Long COVID diagnosis

    National Center for Biomedical Ontology: Advancing biomedicine through structured organization of scientific knowledge

    Get PDF
    The National Center for Biomedical Ontology is a consortium that comprises leading informaticians, biologists, clinicians, and ontologists, funded by the National Institutes of Health (NIH) Roadmap, to develop innovative technology and methods that allow scientists to record, manage, and disseminate biomedical information and knowledge in machine-processable form. The goals of the Center are (1) to help unify the divergent and isolated efforts in ontology development by promoting high quality open-source, standards-based tools to create, manage, and use ontologies, (2) to create new software tools so that scientists can use ontologies to annotate and analyze biomedical data, (3) to provide a national resource for the ongoing evaluation, integration, and evolution of biomedical ontologies and associated tools and theories in the context of driving biomedical projects (DBPs), and (4) to disseminate the tools and resources of the Center and to identify, evaluate, and communicate best practices of ontology development to the biomedical community. Through the research activities within the Center, collaborations with the DBPs, and interactions with the biomedical community, our goal is to help scientists to work more effectively in the e-science paradigm, enhancing experiment design, experiment execution, data analysis, information synthesis, hypothesis generation and testing, and understand human disease
    • …
    corecore