30 research outputs found

    The flashfm approach for fine-mapping multiple quantitative traits

    Get PDF
    This is the final version. Available on open access from Nature Research via the DOI in this recordData availability: Detailed flashfm multi-trait fine-mapping results and FINEMAP single-trait fine-mapping results for the Ugandan cardiometabolic traits are provided in Supplementary Data 3 and 4, respectively; summary fine-mapping results are provided in Supplementary Data 2.pdf. The Uganda GWAS data used in this study are available in the GWAS Catalogue under PubMed ID 31675503 (https://www.ebi.ac.uk/gwas/publications/31675503#study_panel). The Ugandan genotype data are from the European Genome-phenome Archive (EGA) under accession numbers EGAS00001001558 /EGAD00010000965, EGAS00001000545 /EGAD00001001639. The phenotype data used in this study are not under restricted access and requests for access to data may be directed to [email protected]. The CEU population 1000Geomes phase 3 haplotype data that were used in our simulations are available from http://grch37.ensembl.org/Homo_sapiens/Tools/DataSlicer.Code availability: Our proposed multi-trait fine-mapping method, Flexible and shared information fine-mapping (flashfm), is freely available as an R library at https://jennasimit.github.io/flashfm/ (DOI: 10.5281/zenodo.552291544). Single-trait fine-mapping was performed with FINEMAP 1.4 (http://www.christianbenner.com/), as well as our extended version of JAM (based on JAM from R2BGLiMS; https://github.com/pjnewcombe/R2BGLiMS) that is included in the flashfm package. Custom code for the analysis of the Ugandan data is available at https://github.com/nicolashernandezb/flashfm-analysis. The annotation tools we used are HaploReg v4.1 (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php) and Ensembl Variant Effect Predictor (VEP) GRCh37 (https://grch37.ensembl.org/info/docs/tools/vep/index.html). We simulated genotype data with hapgen2 (http://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html).Joint fine-mapping that leverages information between quantitative traits could improve accuracy and resolution over single-trait fine-mapping. Using summary statistics, flashfm (flexible and shared information fine-mapping) fine-maps signals for multiple traits, allowing for missing trait measurements and use of related individuals. In a Bayesian framework, prior model probabilities are formulated to favour model combinations that share causal variants to capitalise on information between traits. Simulation studies demonstrate that both approaches produce broadly equivalent results when traits have no shared causal variants. When traits share at least one causal variant, flashfm reduces the number of potential causal variants by 30% compared with single-trait fine-mapping. In a Ugandan cohort with 33 cardiometabolic traits, flashfm gave a 20% reduction in the total number of potential causal variants from single-trait fine-mapping. Here we show flashfm is computationally efficient and can easily be deployed across publicly available summary statistics for signals in up to six traits.Medical Research Council (MRC)Wellcome TrustNational Institute for Health Research (NIHR)Research Englan

    Whole exome re-sequencing implicates CCDC38 and cilia structure and function in resistance to smoking related airflow obstruction

    Get PDF
    Chronic obstructive pulmonary disease (COPD) is a leading cause of global morbidity and mortality and, whilst smoking remains the single most important risk factor, COPD risk is heritable. Of 26 independent genomic regions showing association with lung function in genome-wide association studies, eleven have been reported to show association with airflow obstruction. Although the main risk factor for COPD is smoking, some individuals are observed to have a high forced expired volume in 1 second (FEV1) despite many years of heavy smoking. We # hypothesised that these ‘‘resistant smokers’’ may harbour variants which protect against lung function decline caused by smoking and provide insight into the genetic determinants of lung health. We undertook whole exome re sequencing of 100 heavy smokers who had healthy lung function given their age, sex, height and smoking history and applied three complementary approaches to explore the genetic architecture of smoking resistance. Firstly, we identified novel functional variants in the ‘‘resistant smokers’’ and looked for enrichment of these novel variants within biological pathways. Secondly, we undertook association testing of all exonic variants individually with two independent control sets. Thirdly, we undertook gene-based association testing of all exonic variants. Our strongest signal of association with smoking resistance for a non-synonymous SNP was for rs10859974 (P = 2.3461024) in CCDC38, a gene which has previously been reported to show association with FEV1/FVC, and we demonstrate moderate expression of CCDC38 in bronchial epithelial cells. We identified an enrichment of novel putatively functional variants in genes related to cilia structure and function in resistant smokers. Ciliary function abnormalities are known to be associated with both smoking and reduced mucociliary clearance in patients with COPD. We suggest that genetic influences on the development or function of cilia in the bronchial epithelium may affect growth of cilia or the extent of damage caused by tobacco smoke

    A two-stage inter-rater approach for enrichment testing of variants associated with multiple traits

    Get PDF
    Shared genetic aetiology may explain the co-occurrence of diseases in individuals more often than expected by chance. On identifying associated variants shared between two traits, one objective is to determine whether such overlap may be explained by specific genomic characteristics (eg, functional annotation). In clinical studies, inter-rater agreement approaches assess concordance among expert opinions on the presence/absence of a complex disease for each subject. We adapt a two-stage inter-rater agreement model to the genetic association setting to identify features predictive of overlap variants, while accounting for their marginal trait associations. The resulting corrected overlap and marginal enrichment test (COMET) also assesses enrichment at the individual trait level. Multiple categories may be tested simultaneously and the method is computationally efficient, not requiring permutations to assess significance. In an extensive simulation study, COMET identifies features predictive of enrichment with high power and has well-calibrated type I error. In contrast, testing for overlap with a single-trait enrichment test has inflated type I error. COMET is applied to three glycaemic traits using a set of functional annotation categories as predictors, followed by further analyses that focus on tissue-specific regulatory variants. The results support previous findings that regulatory variants in pancreatic islets are enriched for fasting glucose-associated variants, and give insight into differences/similarities between characteristics of variants associated with glycaemic traits. Also, despite regulatory variants in pancreatic islets being enriched for variants that are marginally associated with fasting glucose and fasting insulin, there is no enrichment of shared variants between the traits

    Statistical Guidance for Experimental Design and Data Analysis of Mutation Detection in Rare Monogenic Mendelian Diseases by Exome Sequencing

    Get PDF
    Recently, whole-genome sequencing, especially exome sequencing, has successfully led to the identification of causal mutations for rare monogenic Mendelian diseases. However, it is unclear whether this approach can be generalized and effectively applied to other Mendelian diseases with high locus heterogeneity. Moreover, the current exome sequencing approach has limitations such as false positive and false negative rates of mutation detection due to sequencing errors and other artifacts, but the impact of these limitations on experimental design has not been systematically analyzed. To address these questions, we present a statistical modeling framework to calculate the power, the probability of identifying truly disease-causing genes, under various inheritance models and experimental conditions, providing guidance for both proper experimental design and data analysis. Based on our model, we found that the exome sequencing approach is well-powered for mutation detection in recessive, but not dominant, Mendelian diseases with high locus heterogeneity. A disease gene responsible for as low as 5% of the disease population can be readily identified by sequencing just 200 unrelated patients. Based on these results, for identifying rare Mendelian disease genes, we propose that a viable approach is to combine, sequence, and analyze patients with the same disease together, leveraging the statistical framework presented in this work

    Large-scale exome array summary statistics resources for glycemic traits to aid effector gene prioritization.

    Get PDF
    BACKGROUND: Genome-wide association studies for glycemic traits have identified hundreds of loci associated with these biomarkers of glucose homeostasis. Despite this success, the challenge remains to link variant associations to genes, and underlying biological pathways. METHODS: To identify coding variant associations which may pinpoint effector genes at both novel and previously established genome-wide association loci, we performed meta-analyses of exome-array studies for four glycemic traits: glycated hemoglobin (HbA1c, up to 144,060 participants), fasting glucose (FG, up to 129,665 participants), fasting insulin (FI, up to 104,140) and 2hr glucose post-oral glucose challenge (2hGlu, up to 57,878). In addition, we performed network and pathway analyses. RESULTS: Single-variant and gene-based association analyses identified coding variant associations at more than 60 genes, which when combined with other datasets may be useful to nominate effector genes. Network and pathway analyses identified pathways related to insulin secretion, zinc transport and fatty acid metabolism. HbA1c associations were strongly enriched in pathways related to blood cell biology. CONCLUSIONS: Our results provided novel glycemic trait associations and highlighted pathways implicated in glycemic regulation. Exome-array summary statistic results are being made available to the scientific community to enable further discoveries

    Transancestral fine-mapping of four type 2 diabetes susceptibility loci highlights potential causal regulatory mechanisms

    Get PDF
    To gain insight into potential regulatory mechanisms through which the effects of variants at four established type 2 diabetes (T2D) susceptibility loci (CDKAL1, CDKN2A-B, IGF2BP2 and KCNQ1) are mediated, we undertook transancestral fine-mapping in 22 086 cases and 42 539 controls of East Asian, European, South Asian, African American and Mexican American descent. Through high-density imputation and conditional analyses, we identified seven distinct association signals at these four loci, each with allelic effects on T2D susceptibility that were homogenous across ancestry groups. By leveraging differences in the structure of linkage disequilibrium between diverse populations, and increased sample size, we localised the variants most likely to drive each distinct association signal. We demonstrated that integration of these genetic fine-mapping data with genomic annotation can highlight potential causal regulatory elements in T2D-relevant tissues. These analyses provide insight into the mechanisms through which T2D association signals are mediated, and suggest future routes to understanding the biology of specific disease susceptibility loci

    ARIEL and AMELIA: testing for an accumulation of rare variants using next-generation sequencing data.

    Get PDF
    OBJECTIVES: There is increasing evidence that rare variants play a role in some complex traits, but their analysis is not straightforward. Locus-based tests become necessary due to low power in rare variant single-point association analyses. In addition, variant quality scores are available for sequencing data, but are rarely taken into account. Here, we propose two locus-based methods that incorporate variant quality scores: a regression-based collapsing approach and an allele-matching method. METHODS: Using simulated sequencing data we compare 4 locus-based tests of trait association under different scenarios of data quality. We test two collapsing-based approaches and two allele-matching-based approaches, taking into account variant quality scores and ignoring variant quality scores. We implement the collapsing and allele-matching approaches accounting for variant quality in the freely available ARIEL and AMELIA software. RESULTS: The incorporation of variant quality scores in locus-based association tests has power advantages over weighting each variant equally. The allele-matching methods are robust to the presence of both protective and risk variants in a locus, while collapsing methods exhibit a dramatic loss of power in this scenario. CONCLUSIONS: The incorporation of variant quality scores should be a standard protocol when performing locus-based association analysis on sequencing data. The ARIEL and AMELIA software implement collapsing and allele-matching locus association analysis methods, respectively, that allow the incorporation of variant quality scores

    Genome-Wide Association Analysis of Imputed Rare Variants: Application to Seven Common Complex Diseases

    Get PDF
    Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits. © 2012 Wiley Periodicals, Inc

    Stochastic search and joint fine-mapping increases accuracy and identifies previously unreported associations in immune-mediated diseases

    No full text
    Thousands of genetic variants are associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. Both lack of power, and joint tagging of two or more distinct causal variants by a single non-causal SNP, lead to inaccuracies in fine-mapping, with stochastic search more robust than stepwise. We develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. MFM analysis of six immune-mediated diseases reveals causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes
    corecore