48 research outputs found

    Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders

    Get PDF
    BACKGROUND: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose. OBJECTIVE: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community. METHODS: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers). RESULTS: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank. CONCLUSIONS: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery

    Understanding views around the creation of a consented, donated databank of clinical free text to develop and train natural language processing models for research: focus group interviews with stakeholders

    Get PDF
    Background: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose. Objective: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community. Methods: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers). Results: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank. Conclusions: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery

    UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER

    Get PDF
    Objective: Electronic health records (EHRs) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems, and collected for purposes other than medical research. We describe an approach for developing, validating, and sharing reproducible phenotypes from national structured EHR in the United Kingdom with applications for translational research. Materials and Methods: We implemented a rule-based phenotyping framework, with up to 6 approaches of validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements (for example, blood pressure; medication information; coded diagnoses, symptoms, procedures, and referrals), recorded using 5 controlled clinical terminologies: (1) read (primary care, subset of SNOMED-CT [Systematized Nomenclature of Medicine Clinical Terms]), (2) International Classification of Diseases–Ninth Revision and Tenth Revision (secondary care diagnoses and cause of mortality), (3) Office of Population Censuses and Surveys Classification of Surgical Operations and Procedures, Fourth Revision (hospital surgical procedures), and (4) DMþD prescription codes. Results: Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers, and lifestyle risk factors and provide up to 6 validation approaches. The EHR phenotypes are curated in the open-access CALIBER Portal (https://www.caliberresearch.org/portal) and have been used by 40 national and international research groups in 60 peer-reviewed publications. Conclusions: We describe a UK EHR phenomics approach within the CALIBER EHR data platform with initial evidence of validity and use, as an important step toward international use of UK EHR data for health research

    Evaluating the Quality of Research into a Single Prognostic Biomarker: A Systematic Review and Meta-analysis of 83 Studies of C-Reactive Protein in Stable Coronary Artery Disease

    Get PDF
    Background Systematic evaluations of the quality of research on a single prognostic biomarker are rare. We sought to evaluate the quality of prognostic research evidence for the association of C-reactive protein (CRP) with fatal and nonfatal events among patients with stable coronary disease. Methods and Findings We searched MEDLINE (1966 to 2009) and EMBASE (1980 to 2009) and selected prospective studies of patients with stable coronary disease, reporting a relative risk for the association of CRP with death and nonfatal cardiovascular events. We included 83 studies, reporting 61,684 patients and 6,485 outcome events. No study reported a prespecified statistical analysis protocol; only two studies reported the time elapsed (in months or years) between initial presentation of symptomatic coronary disease and inclusion in the study. Studies reported a median of seven items (of 17) from the REMARK reporting guidelines, with no evidence of change over time. The pooled relative risk for the top versus bottom third of CRP distribution was 1.97 (95% confidence interval [CI] 1.78–2.17), with substantial heterogeneity (I2 = 79.5). Only 13 studies adjusted for conventional risk factors (age, sex, smoking, obesity, diabetes, and low-density lipoprotein [LDL] cholesterol) and these had a relative risk of 1.65 (95% CI 1.39–1.96), I2 = 33.7. Studies reported ten different ways of comparing CRP values, with weaker relative risks for those based on continuous measures. Adjusting for publication bias (for which there was strong evidence, Egger's p<0.001) using a validated method reduced the relative risk to 1.19 (95% CI 1.13–1.25). Only two studies reported a measure of discrimination (c-statistic). In 20 studies the detection rate for subsequent events could be calculated and was 31% for a 10% false positive rate, and the calculated pooled c-statistic was 0.61 (0.57–0.66). Conclusion Multiple types of reporting bias, and publication bias, make the magnitude of any independent association between CRP and prognosis among patients with stable coronary disease sufficiently uncertain that no clinical practice recommendations can be made. Publication of prespecified statistical analytic protocols and prospective registration of studies, among other measures, might help improve the quality of prognostic biomarker research

    Prospective study design and data analysis in UK Biobank

    Get PDF
    Population-based prospective studies, such as UK Biobank, are valuable for generating and testing hypotheses about the potential causes of human disease. We describe how UK Biobank's study design, data access policies, and approaches to statistical analysis can help to minimize error and improve the interpretability of research findings, with implications for other population-based prospective studies being established worldwide.</p

    Advancing social equity in and through marine conservation

    Get PDF
    Substantial efforts and investments are being made to increase the scale and improve the effectiveness of marine conservation globally. Though it is mandated by international law and central to conservation policy, less attention has been given to how to operationalize social equity in and through the pursuit of marine conservation. In this article, we aim to bring greater attention to this topic through reviewing how social equity can be better integrated in marine conservation policy and practice. Advancing social equity in marine conservation requires directing attention to: recognition through acknowledgment and respect for diverse peoples and perspectives; fair distribution of impacts through maximizing benefits and minimizing burdens; procedures through fostering participation in decision-making and good governance; management through championing and supporting local involvement and leadership; the environment through ensuring the efficacy of conservation actions and adequacy of management to ensure benefits to nature and people; and the structural barriers to and institutional roots of inequity in conservation. We then discuss the role of various conservation organizations in advancing social equity in marine conservation and identify the capacities these organizations need to build. We urge the marine conservation community, including governments, non-governmental organizations and donors, to commit to the pursuit of socially equitable conservation

    Subsequent Event Risk in Individuals with Established Coronary Heart Disease:Design and Rationale of the GENIUS-CHD Consortium

    Get PDF
    BACKGROUND: The "GENetIcs of sUbSequent Coronary Heart Disease" (GENIUS-CHD) consortium was established to facilitate discovery and validation of genetic variants and biomarkers for risk of subsequent CHD events, in individuals with established CHD. METHODS: The consortium currently includes 57 studies from 18 countries, recruiting 185,614 participants with either acute coronary syndrome, stable CHD or a mixture of both at baseline. All studies collected biological samples and followed-up study participants prospectively for subsequent events. RESULTS: Enrollment into the individual studies took place between 1985 to present day with duration of follow up ranging from 9 months to 15 years. Within each study, participants with CHD are predominantly of self-reported European descent (38%-100%), mostly male (44%-91%) with mean ages at recruitment ranging from 40 to 75 years. Initial feasibility analyses, using a federated analysis approach, yielded expected associations between age (HR 1.15 95% CI 1.14-1.16) per 5-year increase, male sex (HR 1.17, 95% CI 1.13-1.21) and smoking (HR 1.43, 95% CI 1.35-1.51) with risk of subsequent CHD death or myocardial infarction, and differing associations with other individual and composite cardiovascular endpoints. CONCLUSIONS: GENIUS-CHD is a global collaboration seeking to elucidate genetic and non-genetic determinants of subsequent event risk in individuals with established CHD, in order to improve residual risk prediction and identify novel drug targets for secondary prevention. Initial analyses demonstrate the feasibility and reliability of a federated analysis approach. The consortium now plans to initiate and test novel hypotheses as well as supporting replication and validation analyses for other investigators
    corecore