8 research outputs found

    Genomic architecture and prediction of censored time-to-event phenotypes with a Bayesian genome-wide analysis

    Get PDF
    While recent advancements in computation and modelling have improved the analysis of complex traits, our understanding of the genetic basis of the time at symptom onset remains limited. Here, we develop a Bayesian approach (BayesW) that provides probabilistic inference of the genetic architecture of age-at-onset phenotypes in a sampling scheme that facilitates biobank-scale time-to-event analyses. We show in extensive simulation work the benefits BayesW provides in terms of number of discoveries, model performance and genomic prediction. In the UK Biobank, we find many thousands of common genomic regions underlying the age-at-onset of high blood pressure (HBP), cardiac disease (CAD), and type-2 diabetes (T2D), and for the genetic basis of onset reflecting the underlying genetic liability to disease. Age-at-menopause and age-at-menarche are also highly polygenic, but with higher variance contributed by low frequency variants. Genomic prediction into the Estonian Biobank data shows that BayesW gives higher prediction accuracy than other approaches

    Multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults

    Get PDF
    The molecular factors which control circulating levels of inflammatory proteins are not well understood. Furthermore, association studies between molecular probes and human traits are often performed by linear model-based methods which may fail to account for complex structure and interrelationships within molecular datasets.In this study, we perform genome- and epigenome-wide association studies (GWAS/EWAS) on the levels of 70 plasma-derived inflammatory protein biomarkers in healthy older adults (Lothian Birth Cohort 1936; n = 876; Olink® inflammation panel). We employ a Bayesian framework (BayesR+) which can account for issues pertaining to data structure and unknown confounding variables (with sensitivity analyses using ordinary least squares- (OLS) and mixed model-based approaches). We identified 13 SNPs associated with 13 proteins (n = 1 SNP each) concordant across OLS and Bayesian methods. We identified 3 CpG sites spread across 3 proteins (n = 1 CpG each) that were concordant across OLS, mixed-model and Bayesian analyses. Tagged genetic variants accounted for up to 45% of variance in protein levels (for MCP2, 36% of variance alone attributable to 1 polymorphism). Methylation data accounted for up to 46% of variation in protein levels (for CXCL10). Up to 66% of variation in protein levels (for VEGFA) was explained using genetic and epigenetic data combined. We demonstrated putative causal relationships between CD6 and IL18R1 with inflammatory bowel disease and between IL12B and Crohn’s disease. Our data may aid understanding of the molecular regulation of the circulating inflammatory proteome as well as causal relationships between inflammatory mediators and disease

    Improving GWAS discovery and genomic prediction accuracy in biobank data

    No full text
    Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency–linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated h2SNP. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average χ2 value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies

    Improving genome-wide association discovery and genomic prediction accuracy in biobank data

    No full text
    Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency–linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R 2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated h SNP 2 . We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average χ2 value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies

    Probabilistic inference of the genetic architecture underlying functional enrichment of complex traits

    Get PDF
    We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32–44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data

    Additional file 2 of multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults

    No full text
    Additional file 2: Supplementary Tables. The association of pre-adjusted protein levels with biological and technical covariates. Protein levels were adjusted for age, sex, array plate and four genetic principal components (population structure) prior to analyses. Significant associations are emboldened. (Table S1). pQTLs associated with inflammatory biomarker levels from Bayesian penalised regression model (Posterior Inclusion Probability > 95%). (Table S2). All pQTLs associated with inflammatory biomarker levels from ordinary least squares regression model (P  95%). (Table S12). CpGs associated with inflammatory protein biomarkers as identified by linear model (limma) at P < 5.14 × 10− 10. (Table S13). CpGs associated with inflammatory protein biomarkers as identified by mixed linear model (OSCA) at P < 5.14 × 10− 10. (Table S14). Estimate of variance explained for blood protein levels by DNA methylation as well as proportion of explained attributable to different prior mixtures - BayesR+. (Table S15). Comparison of variance in protein levels explained by genome-wide DNA methylation data by mixed linear model (OSCA) and Bayesian penalised regression model (BayesR+). (Table S16). Variance in circulating inflammatory protein biomarker levels explained by common genetic and methylation data (joint and conditional estimates from BayesR+). Ordered by combined variance explained by genetic and epigenetic data - smallest to largest. Significant results from t-tests comparing distributions for variance explained by methylation or genetics alone versus combined estimate are emboldened. (Table S17). Genetic and epigenetic factors identified by BayesR+ when conditioning on all SNPs and CpGs together. (Table S18). Mendelian Randomisation analyses to assess whether proteins with concordantly identified genetic signals are causally associated with Alzheimer’s disease risk. (Table S19)

    Incarceration history is associated with HIV infection among community-recruited people who inject drugs in Europe:a propensity-score matched analysis of cross- sectional studies

    No full text
    AimsWe measured the association between a history of incarceration and HIV positivity among people who inject drugs (PWID) across Europe.Design, Setting and ParticipantsThis was a cross-sectional, multi-site, multi-year propensity-score matched analysis conducted in Europe. Participants comprised community-recruited PWID who reported a recent injection (within the last 12 months).MeasurementsData on incarceration history, demographics, substance use, sexual behavior and harm reduction service use originated from cross-sectional studies among PWID in Europe. Our primary outcome was HIV status. Generalized linear mixed models and propensity-score matching were used to compare HIV status between ever- and never-incarcerated PWID.FindingsAmong 43 807 PWID from 82 studies surveyed (in 22 sites and 13 countries), 58.7% reported having ever been in prison and 7.16% (n = 3099) tested HIV-positive. Incarceration was associated with 30% higher odds of HIV infection [adjusted odds ratio (aOR) = 1.32, 95% confidence interval (CI) = 1.09–1.59]; the association between a history of incarceration and HIV infection was strongest among PWID, with the lowest estimated propensity-score for having a history of incarceration (aOR = 1.78, 95% CI = 1.47–2.16). Additionally, mainly injecting cocaine and/or opioids (aOR = 2.16, 95% CI = 1.33–3.53), increased duration of injecting drugs (per 8 years aOR = 1.31, 95% CI = 1.16–1.48), ever sharing needles/syringes (aOR = 1.91, 95% CI = 1.59–2.28) and increased income inequality among the general population (measured by the Gini index, aOR = 1.34, 95% CI = 1.18–1.51) were associated with a higher odds of HIV infection. Older age (per 8 years aOR = 0.84, 95% CI = 0.76–0.94), male sex (aOR = 0.77, 95% CI = 0.65–0.91) and reporting pharmacies as the main source of clean syringes (aOR = 0.72, 95% CI = 0.59–0.88) were associated with lower odds of HIV positivity.ConclusionsA history of incarceration appears to be independently associated with HIV infection among people who inject drugs (PWID) in Europe, with a stronger effect among PWID with lower probability of incarceration.</div
    corecore