5 research outputs found

    Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis.

    Get PDF
    OBJECTIVES: 1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs. METHODS: This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge. RESULTS: Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods. CONCLUSION: Data-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs

    Genetic associations with sporadic cerebral small vessel disease

    Get PDF
    Background: Cerebral small vessel disease (SVD) causes substantial cognitive, psychiatric and physical disabilities. Despite its common nature, SVD pathogenesis and molecular mechanisms remain poorly understood, and prevention and treatment are probably suboptimal. Identifying the genetic determinants of SVD will improve understanding and may help identify novel treatment targets. The aim of this thesis is to better understand genetic associations with SVD through investigating its pathological, radiological and clinical phenotypes. Methods: To unravel the genetic associations with SVD, I used three complementary approaches. First, I performed a systematic review looking at existing intracerebral haemorrhage (ICH) classification systems and their reliability, to help inform future studies of ICH genetics. Second, I performed a series of systematic reviews and meta-analyses, investigating associations between genetic polymorphisms and histopathologically confirmed cerebral amyloid angiopathy (CAA). Third, I performed meta-analyses of existing genome-wide datasets to determine associations of >1000 common single nucleotide polymorphisms (SNP) in the COL4A1/COL4A2 genomic region with clinico-radiological SVD phenotypes: ICH and its subtypes, ischaemic stroke and its subtypes, and white matter hyperintensities. Results: The reliability of existing ICH classification systems appeared excellent in eight studies conducted in specialist centres with experienced raters, although these existing systems have several limitations. In my systematic evaluation of CAA genetics, meta-analyses of 24 studies including 3520 participants showed robust evidence for a dose-dependent association between APOE ɛ4 and histopathological CAA. There was, however, no convincing association between APOE ɛ2 and presence of CAA in a meta-analysis of 11 studies including 1640 participants. Meta-analyses of five studies including 497 participants showed, contrary to an existing popular hypothesis, that while APOE 4 may increase the risk of developing severe CAA vasculopathy, there is no clear evidence to support a role of ɛ2. There were few data about the role of APOE in hereditary CAA, but in the three studies that had looked at this, there was no evidence for an association between APOE ɛ4 and CAA severity. There were too few studies and participants to draw firm conclusions about the effect of non-APOE ε2/ε3/ε4 genetic polymorphisms on CAA, but there were positive associations with TGF-β1, TOMM40 and CR1 genes in four studies. Finally, in my meta-analyses of the COL4A1/COL4A2 genomic region, three intronic SNPs in COL4A2 were associated with SVD phenotypes: significantly with deep ICH, and suggestively with lacunar ischaemic stroke and WMH. Conclusions: I have shown that while existing ICH classification systems appear to have very good reliability, further research is needed to determine their performance in different settings. For large population-based prospective studies of ICH genetics, anatomical systems are likely to be more feasible, scalable and appropriate, although they have limitations and will need to be further developed. Using systematic reviews and meta-analyses, I have confirmed a dose-related association between APOE ɛ4 and histopathological CAA, but also demonstrated that, despite popular acceptance, there is insufficient data to draw firm conclusions about the association with APOE ɛ2. I found some positive associations with CAA in other genes, which merit replication in further larger studies, and showed that there is currently insufficient data about the role of APOE in hereditary CAA. Finally, I identified a novel association between a locus in a known hereditary SVD gene – COL4A2 – and sporadic SVD. This highlights a new and successful approach for selecting candidate genes and can be expanded in future studies to include other known hereditary SVD genes

    Creating a Single Application and Approval Process to Enable Research; an example using CPRD Primary Care Data and Public Health England Cancer Registry Data

    No full text
    ABSTRACT Objectives To enhance the research value and capability of its primary care database, the Clinical Practice Research Datalink (CPRD) has collaborated with Public Health England (PHE)’s National Cancer Registration and Analysis Service to facilitate access to linked cancer registration data for use in research, pharmacovigilance, drug monitoring and health outcomes analysis. Since 2009, access to this linked resource has been co-managed by CPRD and PHE, through two parallel, independent approvals processes: (a) the MHRA Independent Scientific Advisory Committee (ISAC) and (b) the PHE Office for Data Release (ODR). In upholding the Office for Life Science Ministerial Industry Strategy Group (Health Data Programme)’s vision to minimise process barriers to accessing real world data, CPRD and PHE have worked together to unify and streamline these two processes into a single end-to-end application and approval process. Approach Each organisation reviewed each other’s approval processes to achieve an improved mutual understanding of the respective organisation’s governance approach, the risk based assessments applied to disclosure risk, risk appetites and policies, with the goal to harmonise these into a single approval process. Results CPRD and PHE are finalising a contract establishing a clear operating framework allowing CPRD to grant approval to researchers for the use of linked cancer registry data. The contract names CPRD as a joint data controller and sets out the purposes for processing, the manner of processing and the means by which joint data controller responsibilities will be satisfied. An associated service level agreement is in discussion which will enable robust timelines and performance management for both organisations. These developments are important milestones towards achieving the single approval process by allowing CPRD to review applications for cancer registry data in-house, simultaneously to the ISAC review. Conclusion The strong relationship built between CPRD and PHE, and willingness to develop a single application and approval process, will strengthen and streamline access to these data, whilst assuring patients and the public that scientific integrity is maintained and proportionate information governance checks are in place. Upon completion of this work, applicants will experience associated faster review and feedback time, ultimately leading to faster approvals. Researchers wishing to utilise these linked data will soon be able to submit one application to ISAC, have one point of contact and one approval
    corecore