48 research outputs found

    Mapping the Read2/CTV3 controlled clinical terminologies to Phecodes in UK Biobank primary care electronic health records: implementation and evaluation

    Get PDF
    OBJECTIVE: To establish and validate mappings between primary care clinical terminologies (Read Version 2, Clinical Terms Version 3) and Phecodes. METHODS: We processed 123,662,421 primary care events from 230,096 UK Biobank (UKB) participants. We assessed the validity of the primary care-derived Phecodes by conducting PheWAS analyses for seven pre-selected SNPs in the UKB and compared with estimates from BioVU. RESULTS: We mapped 92% of Read2 (n=10,834) and 91% of CTV3 (n=21,988) to 1,449 and 1,490 Phecodes. UKB PheWAS using Phecodes from primary care EHR and hospitalizations replicated all (n=22) previously-reported genotype-phenotype associations. When limiting Phecodes to primary care EHR, replication was 81% (n=18). CONCLUSION: We introduced a first version of mappings from Read2/CTV3 to Phecodes. The reference list of diseases provided by Phecodes can be extended, enabling researchers to leverage primary care EHR for high-throughput discovery research

    Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study

    Get PDF
    BACKGROUND: Machine learning has been used to analyse heart failure subtypes, but not across large, distinct, population-based datasets, across the whole spectrum of causes and presentations, or with clinical and non-clinical validation by different machine learning methods. Using our published framework, we aimed to discover heart failure subtypes and validate them upon population representative data. METHODS: In this external, prognostic, and genetic validation study we analysed individuals aged 30 years or older with incident heart failure from two population-based databases in the UK (Clinical Practice Research Datalink [CPRD] and The Health Improvement Network [THIN]) from 1998 to 2018. Pre-heart failure and post-heart failure factors (n=645) included demographic information, history, examination, blood laboratory values, and medications. We identified subtypes using four unsupervised machine learning methods (K-means, hierarchical, K-Medoids, and mixture model clustering) with 87 of 645 factors in each dataset. We evaluated subtypes for (1) external validity (across datasets); (2) prognostic validity (predictive accuracy for 1-year mortality); and (3) genetic validity (UK Biobank), association with polygenic risk score (PRS) for heart failure-related traits (n=11), and single nucleotide polymorphisms (n=12). FINDINGS: We included 188 800, 124 262, and 9573 individuals with incident heart failure from CPRD, THIN, and UK Biobank, respectively, between Jan 1, 1998, and Jan 1, 2018. After identifying five clusters, we labelled heart failure subtypes as (1) early onset, (2) late onset, (3) atrial fibrillation related, (4) metabolic, and (5) cardiometabolic. In the external validity analysis, subtypes were similar across datasets (c-statistics: THIN model in CPRD ranged from 0·79 [subtype 3] to 0·94 [subtype 1], and CPRD model in THIN ranged from 0·79 [subtype 1] to 0·92 [subtypes 2 and 5]). In the prognostic validity analysis, 1-year all-cause mortality after heart failure diagnosis (subtype 1 0·20 [95% CI 0·14-0·25], subtype 2 0·46 [0·43-0·49], subtype 3 0·61 [0·57-0·64], subtype 4 0·11 [0·07-0·16], and subtype 5 0·37 [0·32-0·41]) differed across subtypes in CPRD and THIN data, as did risk of non-fatal cardiovascular diseases and all-cause hospitalisation. In the genetic validity analysis the atrial fibrillation-related subtype showed associations with the related PRS. Late onset and cardiometabolic subtypes were the most similar and strongly associated with PRS for hypertension, myocardial infarction, and obesity (p<0·0009). We developed a prototype app for routine clinical use, which could enable evaluation of effectiveness and cost-effectiveness. INTERPRETATION: Across four methods and three datasets, including genetic data, in the largest study of incident heart failure to date, we identified five machine learning-informed subtypes, which might inform aetiological research, clinical risk prediction, and the design of heart failure trials. FUNDING: European Union Innovative Medicines Initiative-2

    Incidence, morbidity, mortality and disparities in dementia: A population linked electronic health records study of 4.3 million individuals

    Get PDF
    INTRODUCTION: We report dementia incidence, comorbidities, reasons for health-care visits, mortality, causes of death, and examined dementia patterns by relative deprivation in the UK. METHOD: A longitudinal cohort analysis of linked electronic health records from 4.3 million people in the UK was conducted to investigate dementia incidence and mortality. Reasons for hospitalization and causes of death were compared in individuals with and without dementia. RESULTS: From 1998 to 2016 we observed 145,319 (3.1%) individuals with incident dementia. Repeated hospitalizations among senior adults for infection, unknown morbidity, and multiple primary care visits for chronic pain were observed prior to dementia diagnosis. Multiple long-term conditions are present in half of the individuals at the time of diagnosis. Individuals living in high deprivation areas had higher dementia incidence and high fatality. DISCUSSION: There is a considerable disparity of dementia that informs priorities of prevention and provision of patient care

    Genome-wide analysis of health-related biomarkers in the UK Household Longitudinal Study reveals novel associations

    Get PDF
    Serum biomarker levels are associated with the risk of complex diseases. Here, we aimed to gain insights into the genetic architecture of biomarker traits which can reflect health status. We performed genome-wide association analyses for twenty serum biomarkers involved in organ function and reproductive health. 9,961 individuals from the UK Household Longitudinal Study were genotyped using the Illumina HumanCoreExome array and variants imputed to the 1000 Genomes Project and UK10K haplotypes. We establish a polygenic heritability for all biomarkers, confirm associations of fifty-four established loci, and identify five novel, replicating associations at genome-wide significance. A low-frequency variant, rs28929474, (beta = 0.04, P = 2 × 10-10) was associated with levels of alanine transaminase, an indicator of liver damage. The variant is located in the gene encoding serine protease inhibitor, low levels of which are associated with alpha-1 antitrypsin deficiency which leads to liver disease. We identified novel associations (rs78900934, beta = 0.05, P = 6 × 10-12; rs2911280, beta = 0.09, P = 6 × 10-10) for dihydroepiandrosterone sulphate, a precursor to major sex-hormones, and for glycated haemoglobin (rs12819124, beta = -0.03, P = 4 × 10-9; rs761772, beta = 0.05, P = 5 × 10-9). rs12819124 is nominally associated with risk of type 2 diabetes. Our study offers insights into the genetic architecture of well-known and less well-studied biomarkers.Please visit the publisher's website for further information

    Associations Between Measures of Sarcopenic Obesity and Risk of Cardiovascular Disease and Mortality: A Cohort Study and Mendelian Randomization Analysis Using the UK Biobank.

    Get PDF
    Background The "healthy obese" hypothesis suggests the risks associated with excess adiposity are reduced in those with higher muscle quality (mass/strength). Alternative possibilities include loss of muscle quality as people become unwell (reverse causality) or unmeasured confounding. Methods and Results We conducted a cohort study using the UK Biobank (n=452 931). Baseline body mass index ( BMI) was used to quantify adiposity and handgrip strength ( HGS ) used for muscle quality. Outcomes were fatal and non-fatal cardiovascular disease, and mortality. As a secondary analysis we used waist-hip-ratio or fat mass percentage instead of BMI , and skeletal muscle mass index instead of HGS . In a subsample, we used gene scores for BMI , waist-hip-ratio and HGS in a Mendelian randomization ( MR ). BMI defined obesity was associated with an increased risk of all outcomes (hazard ratio [ HR ] range 1.10-1.82). Low HGS was associated with increased risks of cardiovascular and all-cause mortality ( HR range 1.39-1.72). HR s for the association between low HGS and cardiovascular disease events were smaller ( HR range 1.05-1.09). There was no suggestion of an interaction between HGS and BMI to support the healthy obese hypothesis. Results using other adiposity metrics were similar. There was no evidence of an association between skeletal muscle mass index and any outcome. Factorial Mendelian randomization confirmed no evidence for an interaction. Low genetically predicted HGS was associated with an increased risk of mortality ( HR range 1.08-1.19). Conclusions Our analyses do not support the healthy obese concept, with no evidence that the adverse effect of obesity on outcomes was reduced by improved muscle quality. Lower HGS was associated with increased risks of mortality in both observational and MR analyses, suggesting reverse causality may not be the sole explanation

    UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER

    Get PDF
    Objective: Electronic health records (EHRs) are a rich source of information on human diseases, but the information is variably structured, fragmented, curated using different coding systems, and collected for purposes other than medical research. We describe an approach for developing, validating, and sharing reproducible phenotypes from national structured EHR in the United Kingdom with applications for translational research. Materials and Methods: We implemented a rule-based phenotyping framework, with up to 6 approaches of validation. We applied our framework to a sample of 15 million individuals in a national EHR data source (population-based primary care, all ages) linked to hospitalization and death records in England. Data comprised continuous measurements (for example, blood pressure; medication information; coded diagnoses, symptoms, procedures, and referrals), recorded using 5 controlled clinical terminologies: (1) read (primary care, subset of SNOMED-CT [Systematized Nomenclature of Medicine Clinical Terms]), (2) International Classification of Diseases–Ninth Revision and Tenth Revision (secondary care diagnoses and cause of mortality), (3) Office of Population Censuses and Surveys Classification of Surgical Operations and Procedures, Fourth Revision (hospital surgical procedures), and (4) DMþD prescription codes. Results: Using the CALIBER phenotyping framework, we created algorithms for 51 diseases, syndromes, biomarkers, and lifestyle risk factors and provide up to 6 validation approaches. The EHR phenotypes are curated in the open-access CALIBER Portal (https://www.caliberresearch.org/portal) and have been used by 40 national and international research groups in 60 peer-reviewed publications. Conclusions: We describe a UK EHR phenomics approach within the CALIBER EHR data platform with initial evidence of validity and use, as an important step toward international use of UK EHR data for health research

    Genome-wide association study of primary tooth eruption identifies pleiotropic loci associated with height and craniofacial distances

    Get PDF
    Twin and family studies indicate that the timing of primary tooth eruption is highly heritable, with estimates typically exceeding 80%. To identify variants involved in primary tooth eruption we performed a population based genome-wide association study of ‘age at first tooth’ and ‘number of teeth’ using 5998 and 6609 individuals respectively from the Avon Longitudinal Study of Parents and Children (ALSPAC) and 5403 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966). We tested 2,446,724 SNPs imputed in both studies. Analyses were controlled for the effect of gestational age, sex and age of measurement. Results from the two studies were combined using fixed effects inverse variance meta-analysis. We identified a total of fifteen independent loci, with ten loci reaching genome-wide significance (p<5x10−8) for ‘age at first tooth’ and eleven loci for ‘number of teeth’. Together these associations explain 6.06% of the variation in ‘age of first tooth’ and 4.76% of the variation in ‘number of teeth’. The identified loci included eight previously unidentified loci, some containing genes known to play a role in tooth and other developmental pathways, including a SNP in the protein-coding region of BMP4 (rs17563, P= 9.080x10−17). Three of these loci, containing the genes HMGA2, AJUBA and ADK, also showed evidence of association with craniofacial distances, particularly those indexing facial width. Our results suggest that the genome-wide association approach is a powerful strategy for detecting variants involved in tooth eruption, and potentially craniofacial growth and more generally organ development

    Metabolomic Profiling of Statin Use and Genetic Inhibition of HMG-CoA Reductase

    Get PDF
    BACKGROUND Statins are first-line therapy for cardiovascular disease prevention, but their systemic effects across lipoprotein subclasses, fatty acids, and circulating metabolites remain incompletely characterized. OBJECTIVES This study sought to determine the molecular effects of statin therapy on multiple metabolic pathways. METHODS Metabolic profiles based on serum nuclear magnetic resonance metabolomics were quantified at 2 time points in 4 population-based cohorts from the United Kingdom and Finland (N = 5,590; 2.5 to 23.0 years of follow-up). Concentration changes in 80 lipid and metabolite measures during follow-up were compared between 716 individuals who started statin therapy and 4,874 persistent nonusers. To further understand the pharmacological effects of statins, we used Mendelian randomization to assess associations of a genetic variant known to mimic inhibition of HMG-CoA reductase (the intended drug target) with the same lipids and metabolites for 27,914 individuals from 8 population-based cohorts. RESULTS Starting statin therapy was associated with numerous lipoprotein and fatty acid changes, including substantial lowering of remnant cholesterol (80% relative to low-density lipoprotein cholesterol [LDL-C]), but only modest lowering of triglycerides (25% relative to LDL-C). Among fatty acids, omega-6 levels decreased the most (68% relative to LDL-C); other fatty acids were only modestly affected. No robust changes were observed for circulating amino acids, ketones, or glycolysis-related metabolites. The intricate metabolic changes associated with statin use closely matched the association pattern with rs12916 in the HMGCR gene (R-2 = 0.94, slope 1.00 +/- 0.03). CONCLUSIONS Statin use leads to extensive lipid changes beyond LDL-C and appears efficacious for lowering remnant cholesterol. Metabolomic profiling, however, suggested minimal effects on amino acids. The results exemplify how detailed metabolic characterization of genetic proxies for drug targets can inform indications, pleiotropic effects, and pharmacological mechanisms. (C) 2016 by the American College of Cardiology Foundation.Peer reviewe

    Causal Associations of Adiposity and Body Fat Distribution With Coronary Heart Disease, Stroke Subtypes, and Type 2 Diabetes MellitusClinical Perspective

    Get PDF
    Background—Implications of different adiposity measures on cardiovascular disease aetiology remain unclear. In this paper we quantify and contrast causal associations of central adiposity (waist:hip ratio adjusted for BMI (WHRadjBMI)) and general adiposity (body mass index (BMI)) with cardiometabolic disease. Methods—97 independent single nucleotide polymorphisms (SNPs) for BMI and 49 SNPs for WHRadjBMI were used to conduct Mendelian randomization analyses in 14 prospective studies supplemented with CHD data from CARDIoGRAMplusC4D (combined total 66,842 cases), stroke from METASTROKE (12,389 ischaemic stroke cases), type 2 diabetes (T2D) from DIAGRAM (34,840 cases), and lipids from GLGC (213,500 participants) consortia. Primary outcomes were CHD, T2D, and major stroke subtypes; secondary analyses included 18 cardiometabolic traits. Results—Each one standard deviation (SD) higher WHRadjBMI (1SD~0.08 units) associated with a 48% excess risk of CHD (odds ratio [OR] for CHD: 1.48; 95%CI: 1.28-1.71), similar to findings for BMI (1SD~4.6kg/m2; OR for CHD: 1.36; 95%CI: 1.22-1.52). Only WHRadjBMI increased risk of ischaemic stroke (OR 1.32; 95%CI 1.03-1.70). For T2D, both measures had large effects: OR 1.82 (95%CI 1.38-2.42) and OR 1.98 (95%CI 1.41-2.78) per 1SD higher WHRadjBMI and BMI respectively. Both WHRadjBMI and BMI were associated with higher left ventricular hypertrophy, glycaemic traits, interleukin-6, and circulating lipids. WHRadjBMI was also associated with higher carotid intima-media thickness (39%; 95%CI: 9%-77% per 1SD). Conclusions—Both general and central adiposity have causal effects on CHD and T2D. Central adiposity may have a stronger effect on stroke risk. Future estimates of the burden of adiposity on health should include measures of central and general adiposity

    Phenome-wide association analysis of LDL-cholesterol lowering genetic variants in PCSK9

    Get PDF
    Abstract: Background: We characterised the phenotypic consequence of genetic variation at the PCSK9 locus and compared findings with recent trials of pharmacological inhibitors of PCSK9. Methods: Published and individual participant level data (300,000+ participants) were combined to construct a weighted PCSK9 gene-centric score (GS). Seventeen randomized placebo controlled PCSK9 inhibitor trials were included, providing data on 79,578 participants. Results were scaled to a one mmol/L lower LDL-C concentration. Results: The PCSK9 GS (comprising 4 SNPs) associations with plasma lipid and apolipoprotein levels were consistent in direction with treatment effects. The GS odds ratio (OR) for myocardial infarction (MI) was 0.53 (95% CI 0.42; 0.68), compared to a PCSK9 inhibitor effect of 0.90 (95% CI 0.86; 0.93). For ischemic stroke ORs were 0.84 (95% CI 0.57; 1.22) for the GS, compared to 0.85 (95% CI 0.78; 0.93) in the drug trials. ORs with type 2 diabetes mellitus (T2DM) were 1.29 (95% CI 1.11; 1.50) for the GS, as compared to 1.00 (95% CI 0.96; 1.04) for incident T2DM in PCSK9 inhibitor trials. No genetic associations were observed for cancer, heart failure, atrial fibrillation, chronic obstructive pulmonary disease, or Alzheimer’s disease – outcomes for which large-scale trial data were unavailable. Conclusions: Genetic variation at the PCSK9 locus recapitulates the effects of therapeutic inhibition of PCSK9 on major blood lipid fractions and MI. While indicating an increased risk of T2DM, no other possible safety concerns were shown; although precision was moderate
    corecore