154 research outputs found
Desiderata for an ontology of diseases for the annotation of biological datasets.
There is a plethora of disease ontologies available, all potentially useful for the annotation of biological datasets. We define seven desirable features for such ontologies and examine whether or not these features are supported by eleven disease ontologies. The four ontologies most closely aligned with our desiderata are Disease Ontology, SNOMED CT, NCI thesaurus and UMLS
Integrating historical clinical and financial data for pharmacological research
pre-printBackground: Retrospective research requires longitudinal data, and repositories derived from electronic health records (EHR) can be sources of such data. With Health Information Technology for Economic and Clinical Health (HITECH) Act meaningful use provisions, many institutions are expected to adopt EHRs, but may be left with large amounts of financial and historical clinical data, which can differ significantly from data obtained from newer systems, due to lack or inconsistent use of controlled medical terminologies (CMT) in older systems. We examined different approaches for semantic enrichment of financial data with CMT, and integration of clinical data from disparate historical and current sources for research. Methods: Snapshots of financial data from 1999, 2004 and 2009 were mapped automatically to the current inpatient pharmacy catalog, and enriched with RxNorm. Administrative metadata from financial and dispensing systems, RxNorm and two commercial pharmacy vocabularies were used to integrate data from current and historical inpatient pharmacy modules, and the outpatient EHR. Data integration approaches were compared using percentages of automated matches, and effects on cohort size of a retrospective study. Results: During 1999-2009, 71.52%-90.08% of items in use from the financial catalog were enriched using RxNorm; 64.95%-70.37% of items in use from the historical inpatient system were integrated using RxNorm, 85.96%-91.67% using a commercial vocabulary, 87.19%-94.23% using financial metadata, and 77.20%-94.68% using dispensing metadata. During 1999-2009, 48.01%-30.72% of items in use from the outpatient catalog were integrated using RxNorm, and 79.27%-48.60% using a commercial vocabulary. In a cohort of 16304 inpatients obtained from clinical systems, 4172 (25.58%) were found exclusively through integration of historical clinical data, while 15978 (98%) could be identified using semantically enriched financial data. Conclusions: Data integration using metadata from financial/dispensing systems and pharmacy vocabularies were comparable. Given the current state of EHR adoption, semantic enrichment of financial data and integration of historical clinical data would allow the repurposing of these data for research. With the push for HITECH meaningful use, institutions that are transitioning to newer EHRs will be able to use their older financial and clinical data for research using these methods
Recommended from our members
Ontology-based Semantic Harmonization of HIV-associated Common Data Elements for Integration of Diverse HIV Research Datasets
Analysis of integrated, diverse, Human Immunodeficiency Virus (HIV)-associated datasets can increase knowledge and guide the development of novel and effective interventions for disease prevention and treatment by increasing breadth of variables and statistical power, particularly for sub-group analyses. This topic has been identified as a National Institutes of Health research priority, but few efforts have been made to integrate data across HIV studies. Our aims were to: 1) Characterize the semantic heterogeneity (SH) in the HIV research domain; 2) Identify HIV-associated common data elements (CDEs) in empirically generated and knowledge-based resources; 3) Create a formal representation of HIV-associated CDEs in the form of an HIV-associated Entities in Research Ontology (HERO); 4) Assess the feasibility of using HERO to semantically harmonize HIV research data. Our approach was guided by information/knowledge theory and the DIKW (Data Information Knowledge Wisdom) hierarchical model.
Our systematized review of the literature revealed that synergistic use of both ontologies and CDEs included integration, interoperability, data exchange, and data standardization. Moreover, methods and tools included use of experts for CDE identification, the Unified Medical Language System, natural language processing, Extensible Markup Language, Health Level 7, and ontology development tools (e.g., Protégé). Additionally, evaluation methods included expert assessment, quantification of mapping tasks between raters, assessment of interrater reliability, and comparison to established standards. We used these findings to inform our process for achieving the study aims.
For Aim 1, we analyzed eight disparate HIV-associated data dictionaries and developed a String Metric-assisted Assessment of Semantic Heterogeneity (SMASH) method, which aided identification of 127 (13%) homogeneous data element (DE) pairs and 1,048 (87%) semantically heterogeneous DE pairs. Most heterogeneous pairs (97%) were semantically-equivalent/syntactically-different, allowing us to determine that SH in the HIV research domain was high.
To achieve Aim 2, we used Clinicaltrials.gov, Google Search, and text mining in R to identify HIV-associated CDEs in HIV journal articles, HIV-associated datasets, AIDSinfo HIV/AIDS Glossary, AIDSinfo Drug Database, Logical Observation Identifiers Names and Codes (LOINC), Systematized Nomenclature of Medicine (SNOMED), and RxNORM (understood as prescription normalization). Two HIV experts then manually reviewed DEs from the journal articles and data dictionaries to confirm DE commonality and resolved semantic discrepancies through discussion. Ultimately, we identified 2,179 unique CDEs. Of all CDEs, data-driven approaches identified 2,055 (94%) (999 from the HIV/AIDS Glossary, 398 from the Drug Database, 91 from journal articles, and a total of 567 from LOINC, SNOMED, and RxNorm cumulatively). Expert-based approaches identified 124 (6%) unique CDEs from data dictionaries and confirmed the 91 CDEs from journal articles.
In Aim 3, we used the Protégé suite of ontology development tools and the 2,179 CDEs to develop the HERO. We modeled the ontology using the semantic structure of the Medical Entities Dictionary, available hierarchical information from the CDE knowledge resources, and expert knowledge. The ontology fulfilled most relevant criteria from Cimino’s desiderata and OntoClean ontology engineering principles, and it successfully answered eight competency questions.
Finally, for Aim 4, we assessed the feasibility of using HERO to semantically harmonize and integrate the data dictionaries from two diverse HIV-associated datasets. Two HIV experts involved in the development of HERO independently assessed each data dictionary. Of the 367 DEs in data dictionary 1 (D1), 181 (49.32%) were identified as CDEs and 186 (50.68%) were not CDEs, and of the 72 DEs in data dictionary 2 (D2), 37 (51.39%) were CDEs and 35 (48.61%) were not CDEs. The HIV experts then traversed HERO’s hierarchy to map CDEs from D1 and D2 to CDEs in HERO. Of the 181 CDEs in D1, 156 (86.19%) were found in HERO, and 25 (13.81%) were not. Similarly, of the 37 CDEs in D2 32 (86.48%) were found in HERO, and 5 (13.51%) were not. Interrater reliability for CDE identification as measured by Cohen’s Kappa was 0.900 for D1 and 0.892 for D2. Cohen’s Kappas for CDEs in D1 and D2 that were also identified in HERO were 0.885 and 0.688, respectively.
Subsequently, to demonstrate the integration of the two HIV-associated datasets, a sample of semantically harmonized CDEs in both datasets was categorically selected (e.g. administrative, demographic, and behavioral), and D2 sample size increases were calculated for race (e.g., White, African American/Black, Asian/Pacific Islander, Native American/Indian, and Hispanic/Latino) and for “intravenous drug use” from the integrated datasets. The average increase of D2 CDEs for six selected CDEs was 1,928%.
Despite the limitation of HERO developers also serving as evaluators, the contributions of the study to the fields of informatics and HIV research were substantial. Confirmatory contributions include: identification of effective CDE/ontology tools, and use of data-driven and expert-based methods. Novel contributions include: development of SMASH and HERO; and new contributions include documenting that SH is high in HIV-associated datasets, identifying 2,179 HIV-associated CDEs, creating two additional classifications of SH, and showing that using HERO for semantic harmonization of HIV-associated data dictionaries is feasible. Our future work will build upon this research by expanding the numbers and types of datasets, refining our methods and tools, and conducting an external evaluation
The role of ontologies in biological and biomedical research: a functional perspective.
Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.This is the final version of the article. It first appeared from Oxford University Press via http://dx.doi.org/10.1093/bib/bbv01
Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain
Ontology is a burgeoning field, involving researchers from the computer science, philosophy, data and software engineering, logic, linguistics, and terminology domains. Many ontology-related terms with precise meanings in one of these domains have different meanings in others. Our purpose here is to initiate a path towards disambiguation of such terms. We draw primarily on the literature of biomedical informatics, not least because the problems caused by unclear or ambiguous use of terms have been there most thoroughly addressed. We advance a proposal resting on a distinction of three levels too often run together in biomedical ontology research: 1. the level of reality; 2. the level of cognitive representations of this reality; 3. the level of textual and graphical artifacts. We propose a reference terminology for ontology research and development that is designed to serve as common hub into which the several competing disciplinary terminologies can be mapped. We then justify our terminological choices through a critical treatment of the ‘concept orientation’ in biomedical terminology research
Advancing translational research with the Semantic Web
<p>Abstract</p> <p>Background</p> <p>A fundamental goal of the U.S. National Institute of Health (NIH) "Roadmap" is to strengthen <it>Translational Research</it>, defined as the movement of discoveries in basic research to application at the clinical level. A significant barrier to translational research is the lack of uniformly structured data across related biomedical domains. The Semantic Web is an extension of the current Web that enables navigation and meaningful use of digital resources by automatic processes. It is based on common formats that support aggregation and integration of data drawn from diverse sources. A variety of technologies have been built on this foundation that, together, support identifying, representing, and reasoning across a wide range of biomedical data. The Semantic Web Health Care and Life Sciences Interest Group (HCLSIG), set up within the framework of the World Wide Web Consortium, was launched to explore the application of these technologies in a variety of areas. Subgroups focus on making biomedical data available in RDF, working with biomedical ontologies, prototyping clinical decision support systems, working on drug safety and efficacy communication, and supporting disease researchers navigating and annotating the large amount of potentially relevant literature.</p> <p>Results</p> <p>We present a scenario that shows the value of the information environment the Semantic Web can support for aiding neuroscience researchers. We then report on several projects by members of the HCLSIG, in the process illustrating the range of Semantic Web technologies that have applications in areas of biomedicine.</p> <p>Conclusion</p> <p>Semantic Web technologies present both promise and challenges. Current tools and standards are already adequate to implement components of the bench-to-bedside vision. On the other hand, these technologies are young. Gaps in standards and implementations still exist and adoption is limited by typical problems with early technology, such as the need for a critical mass of practitioners and installed base, and growing pains as the technology is scaled up. Still, the potential of interoperable knowledge sources for biomedicine, at the scale of the World Wide Web, merits continued work.</p
Recommended from our members
Supporting Clinical Decision Making in Cancer Care Delivery
Background: Cancer treatment and management require complicated clinical decision making to provide the highest quality of care for an individual patient. This is facilitated in part with ever-increasing availability of medications and treatments but hindered due to barriers such as access to care, cost of medications, clinician knowledge, and patient preferences or clinical factors. Although guidelines for cancer treatment and many symptoms have been developed to inform clinical practice, implementation of these guidelines into practice is often delayed or does not occur. Informatics-based approaches, such as clinical decision support, may be an effective tool to improve guideline implementation by delivering patient-specific and evidence-based knowledge to the clinician at the point of care to allow shared decision making with a patient and their family. The large amount of data in the electronic health record can be utilized to develop, evaluate, and implement automated approaches; however, the quality of the data must first be examined and evaluated.
Methods: This dissertation addresses gaps the literature about clinical decision making for cancer care delivery. Specifically, following an introduction and review of the literature for relevant topics to this dissertation, the researcher presents three studies. In Study One, the researcher explores the use of clinical decision support in cancer therapeutic decision making by conducting a systematic review of the literature. In Study Two, the researcher conducts a quantitative study to describe the rate of guideline concordant care provided for prevention of acute chemotherapy-induced nausea and vomiting (CINV) and to identify predictors of receiving guideline concordant care. In Study Three, the researcher conducts a mixed-methods study to evaluate the completeness, concordance, and heterogeneity of clinician documentation of CINV. The final chapter of this dissertation is comprised of key findings of each study, the strengths and limitations, clinical and research implications, and future research.
Results: In Study One, the systematic review, the researcher identified ten studies that prospectively studied clinical decision support systems or tools in a cancer setting to guide therapeutic decision making. There was variability in these studies, including study design, outcomes measured, and results. There was a trend toward benefit, both in process and patient-specific outcomes. Importantly, few studies were integrated into the electronic health record.
In Study Two, of 180 patients age 26 years or less, 36% received guideline concordant care as defined by pediatric or adult guidelines, as appropriate. Factors associated with receiving guideline concordant care included receiving a cisplatin-based regimen, being treated in adult oncology compared to pediatric oncology, and solid tumor diagnosis.
In Study Three, of the 127 patient records reviewed for the documentation of chemotherapy-induced nausea and vomiting, 75% had prescriber assessment documented and 58% had nursing assessment documented. Of those who had documented assessments by both prescriber and nurse, 72% were in agreement of the presence/absence of chemotherapy-induced nausea and vomiting. After mapping the concept through the United Medical Language System and developing a post-coordinated expression to identify chemotherapy-induced nausea and vomiting in the text, 85% of prescriber documentation and 100% of nurse documentation could be correctly categorized as present/absent. Further descriptors of the symptoms, such as severity or temporality, however, were infrequently reported.
Conclusion: In summary, this dissertation provides new knowledge about decision making in cancer care delivery. Specifically, in Study One the researcher describes that clinical decision support, one potential implementation strategy to improve guideline concordant care, is understudied or under published but a promising potential intervention. In Study Two, I identified factors that were associated with receipt of guideline concordant care for CINV, and these should be further explored to develop interventions. Finally, in Study Three, I report on the limitations of the data quality of CINV documentation in the electronic health record. Future work should focus on validating these results on a multi-institutional level
- …