7 research outputs found

    METADATA MANAGEMENT FOR CLINICAL DATA INTEGRATION

    Get PDF
    Clinical data have been continuously collected and growing with the wide adoption of electronic health records (EHR). Clinical data have provided the foundation to facilitate state-of-art researches such as artificial intelligence in medicine. At the same time, it has become a challenge to integrate, access, and explore study-level patient data from large volumes of data from heterogeneous databases. Effective, fine-grained, cross-cohort data exploration, and semantically enabled approaches and systems are needed. To build semantically enabled systems, we need to leverage existing terminology systems and ontologies. Numerous ontologies have been developed recently and they play an important role in semantically enabled applications. Because they contain valuable codified knowledge, the management of these ontologies, as metadata, also requires systematic approaches. Moreover, in most clinical settings, patient data are collected with the help of a data dictionary. Knowledge of the relationships between an ontology and a related data dictionary is important for semantic interoperability. Such relationships are represented and maintained by mappings. Mappings store how data source elements and domain ontology concepts are linked, as well as how domain ontology concepts are linked between different ontologies. While mappings are crucial to the maintenance of relationships between an ontology and a related data dictionary, they are commonly captured by CSV files with limits capabilities for sharing, tracking, and visualization. The management of mappings requires an innovative, interactive, and collaborative approach. Metadata management servers to organize data that describes other data. In computer science and information science, ontology is the metadata consisting of the representation, naming, and definition of the hierarchies, properties, and relations between concepts. A structural, scalable, and computer understandable way for metadata management is critical to developing systems with the fine-grained data exploration capabilities. This dissertation presents a systematic approach called MetaSphere using metadata and ontologies to support the management and integration of clinical research data through our ontology-based metadata management system for multiple domains. MetaSphere is a general framework that aims to manage specific domain metadata, provide fine-grained data exploration interface, and store patient data in data warehouses. Moreover, MetaSphere provides a dedicated mapping interface called Interactive Mapping Interface (IMI) to map the data dictionary to well-recognized and standardized ontologies. MetaSphere has been applied to three domains successfully, sleep domain (X-search), pressure ulcer injuries and deep tissue pressure (SCIPUDSphere), and cancer. Specifically, MetaSphere stores domain ontology structurally in databases. Patient data in the corresponding domains are also stored in databases as data warehouses. MetaSphere provides a powerful query interface to enable interaction between human and actual patient data. Query interface is a mechanism allowing researchers to compose complex queries to pinpoint specific cohort over a large amount of patient data. The MetaSphere framework has been instantiated into three domains successfully and the detailed results are as below. X-search is publicly available at https://www.x-search.net with nine sleep domain datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. SCIPUDSphere has integrated a total number of 268,562 records containing 282 ICD9 codes related to pressure ulcer injuries among 36,626 individuals with spinal cord injuries. IMI is publicly available at http://epi-tome.com/. Using IMI, we have successfully mapped the North American Association of Central Cancer Registries (NAACCR) data dictionary to the National Cancer Institute Thesaurus (NCIt) concepts

    Web-Based Interactive Mapping from Data Dictionaries to Ontologies, with an Application to Cancer Registry

    Get PDF
    BACKGROUND: The Kentucky Cancer Registry (KCR) is a central cancer registry for the state of Kentucky that receives data about incident cancer cases from all healthcare facilities in the state within 6 months of diagnosis. Similar to all other U.S. and Canadian cancer registries, KCR uses a data dictionary provided by the North American Association of Central Cancer Registries (NAACCR) for standardized data entry. The NAACCR data dictionary is not an ontological system. Mapping between the NAACCR data dictionary and the National Cancer Institute (NCI) Thesaurus (NCIt) will facilitate the enrichment, dissemination and utilization of cancer registry data. We introduce a web-based system, called Interactive Mapping Interface (IMI), for creating mappings from data dictionaries to ontologies, in particular from NAACCR to NCIt. METHOD: IMI has been designed as a general approach with three components: (1) ontology library; (2) mapping interface; and (3) recommendation engine. The ontology library provides a list of ontologies as targets for building mappings. The mapping interface consists of six modules: project management, mapping dashboard, access control, logs and comments, hierarchical visualization, and result review and export. The built-in recommendation engine automatically identifies a list of candidate concepts to facilitate the mapping process. RESULTS: We report the architecture design and interface features of IMI. To validate our approach, we implemented an IMI prototype and pilot-tested features using the IMI interface to map a sample set of NAACCR data elements to NCIt concepts. 47 out of 301 NAACCR data elements have been mapped to NCIt concepts. Five branches of hierarchical tree have been identified from these mapped concepts for visual inspection. CONCLUSIONS: IMI provides an interactive, web-based interface for building mappings from data dictionaries to ontologies. Although our pilot-testing scope is limited, our results demonstrate feasibility using IMI for semantic enrichment of cancer registry data by mapping NAACCR data elements to NCIt concepts

    X-search: An Open Access Interface for Cross-Cohort Exploration of the National Sleep Research Resource

    Get PDF
    Background: The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies. Methods: X-search has been designed as a general framework with two loosely-coupled components: semantically annotated data repository and cross-cohort exploration engine. The semantically annotated data repository is comprised of a canonical data dictionary, data sources with a data dictionary, and mappings between each individual data dictionary and the canonical data dictionary. The cross-cohort exploration engine consists of five modules: query builder, graphical exploration, case-control exploration, query translation, and query execution. The canonical data dictionary serves as the unified metadata to drive the visual exploration interfaces and facilitate query translation through the mappings. Results: X-search is publicly available at https://www.x-search.net/ with nine NSRR datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. Conclusions: X-search provides a powerful cross-cohort exploration interface for querying and exploring heterogeneous datasets in the NSRR data repository, so as to enable researchers to evaluate the feasibility of potential research studies and generate potential hypotheses using the NSRR data

    Individualized Clinical Practice Guidelines for Pressure Injury Management: Development of an Integrated Multi-Modal Biomedical Information Resource

    Get PDF
    Background: Pressure ulcers (PU) and deep tissue injuries (DTI), collectively known as pressure injuries are serious complications causing staggering costs and human suffering with over 200 reported risk factors from many domains. Primary pressure injury prevention seeks to prevent the first incidence, while secondary PU/DTI prevention aims to decrease chronic recurrence. Clinical practice guidelines (CPG) combine evidence-based practice and expert opinion to aid clinicians in the goal of achieving best practices for primary and secondary prevention. The correction of all risk factors can be both overwhelming and impractical to implement in clinical practice. There is a need to develop practical clinical tools to prioritize the multiple recommendations of CPG, but there is limited guidance on how to prioritize based on individual cases. Bioinformatics platforms enable data management to support clinical decision support and user-interface development for complex clinical challenges such as pressure injury prevention care planning. Objective: The central hypothesis of the study is that the individual’s risk factor profile can provide the basis for adaptive, personalized care planning for PU prevention based on CPG prioritization. The study objective is to develop the Spinal Cord Injury Pressure Ulcer and Deep Tissue Injury (SCIPUD+) Resource to support personalized care planning for primary and secondary PU/DTI prevention. Methods: The study is employing a retrospective electronic health record (EHR) chart review of over 75 factors known to be relevant for pressure injury risk in individuals with a spinal cord injury (SCI) and routinely recorded in the EHR. We also perform tissue health assessments of a selected sub-group. A systems approach is being used to develop and validate the SCIPUD+ Resource incorporating the many risk factor domains associated with PU/DTI primary and secondary prevention, ranging from the individual’s environment to local tissue health. Our multiscale approach will leverage the strength of bioinformatics applied to an established national EHR system. A comprehensive model is being used to relate the primary outcome of interest (PU/DTI development) with over 75 PU/DTI risk factors using a retrospective chart review of 5000 individuals selected from the study cohort of more than 36,000 persons with SCI. A Spinal Cord Injury Pressure Ulcer and Deep Tissue Injury Ontology (SCIPUDO) is being developed to enable robust text-mining for data extraction from free-form notes. Results: The results from this study are pending. Conclusions: PU/DTI remains a highly significant source of morbidity for individuals with SCI. Personalized interactive care plans may decrease both initial PU formation and readmission rates for high-risk individuals. The project is using established EHR data to build a comprehensive, structured model of environmental, social and clinical pressure injury risk factors. The comprehensive SCIPUD+ health care tool will be used to relate the primary outcome of interest (pressure injury development) with covariates including environmental, social, clinical, personal and tissue health profiles as well as possible interactions among some of these covariates. The study will result in a validated tool for personalized implementation of CPG recommendations and has great potential to change the standard of care for PrI clinical practice by enabling clinicians to provide personalized application of CPG priorities tailored to the needs of each at-risk individual with SCI

    REFORM: REFACTORIZED ELECTRONIC WEB FORMS - LARGE SCALESURVEY DATA CAPTURE AND WORKFLOW CONTROL FRAMEWORK

    No full text

    X-search: an open access interface for cross-cohort exploration of the National Sleep Research Resource

    Get PDF
    Abstract Background The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies. Methods X-search has been designed as a general framework with two loosely-coupled components: semantically annotated data repository and cross-cohort exploration engine. The semantically annotated data repository is comprised of a canonical data dictionary, data sources with a data dictionary, and mappings between each individual data dictionary and the canonical data dictionary. The cross-cohort exploration engine consists of five modules: query builder, graphical exploration, case-control exploration, query translation, and query execution. The canonical data dictionary serves as the unified metadata to drive the visual exploration interfaces and facilitate query translation through the mappings. Results X-search is publicly available at https://www.x-search.net/with nine NSRR datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. Conclusions X-search provides a powerful cross-cohort exploration interface for querying and exploring heterogeneous datasets in the NSRR data repository, so as to enable researchers to evaluate the feasibility of potential research studies and generate potential hypotheses using the NSRR data
    corecore