203 research outputs found

    A two-step multiple-marker strategy for genome-wide association studies

    Get PDF
    Genome-wide association studies raise study-design and analytical issues that are still being debated. Among them, stands the issue of reducing the number of markers to be genotyped without loss of efficiency in identifying trait loci, which can reduce the cost of studies and minimize the multiple testing problem. With this aim, we proposed a two-step strategy based on two analytical methods suited to examine sets of markers rather than single markers: the local score, which screens the genome to select candidate regions in Step 1, and FBAT-LC, a multiple-marker family-based association test used to obtain significance levels of regions at step 2. The performance of this strategy was evaluated on all replicates of Genetic Analysis Workshop 15 Problem 3 simulated data, using the answers to that problem. Overall, seven of the nine generated trait loci were detected in at least 87% of the replicates using a framework designed to handle either association with the disease or association with the severity of disease. This multiple-marker strategy was compared to the single-marker approach. By considering regions instead of single markers, this strategy minimizes the multiple testing problem and the number of false-positive results

    Inflated Type I Error Rates When Using Aggregation Methods to Analyze Rare Variants in the 1000 Genomes Project Exon Sequencing Data in Unrelated Individuals: Summary Results from Group 7 at Genetic Analysis Workshop 17

    Get PDF
    As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and false-positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long-range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes

    Combining effects from rare and common genetic variants in an exome-wide association study of sequence data

    Get PDF
    Recent breakthroughs in next-generation sequencing technologies allow cost-effective methods for measuring a growing list of cellular properties, including DNA sequence and structural variation. Next-generation sequencing has the potential to revolutionize complex trait genetics by directly measuring common and rare genetic variants within a genome-wide context. Because for a given gene both rare and common causal variants can coexist and have independent effects on a trait, strategies that model the effects of both common and rare variants could enhance the power of identifying disease-associated genes. To date, little work has been done on integrating signals from common and rare variants into powerful statistics for finding disease genes in genome-wide association studies. In this analysis of the Genetic Analysis Workshop 17 data, we evaluate various strategies for association of rare, common, or a combination of both rare and common variants on quantitative phenotypes in unrelated individuals. We show that the analysis of common variants only using classical approaches can achieve higher power to detect causal genes than recently proposed rare variant methods and that strategies that combine association signals derived independently in rare and common variants can slightly increase the power compared to strategies that focus on the effect of either the rare variants or the common variants

    Exploring Genome-Wide – Dietary Heme Iron Intake Interactions and the Risk of Type 2 Diabetes

    Get PDF
    Aims/hypothesis: Genome-wide association studies have identified over 50 new genetic loci for type 2 diabetes (T2D). Several studies conclude that higher dietary heme iron intake increases the risk of T2D. Therefore we assessed whether the relation between genetic loci and T2D is modified by dietary heme iron intake. Methods: We used Affymetrix Genome-Wide Human 6.0 array data [681,770 single nucleotide polymorphisms (SNPs)] and dietary information collected in the Health Professionals Follow-up Study (n = 725 cases; n = 1,273 controls) and the Nurses’ Health Study (n = 1,081 cases; n = 1,692 controls). We assessed whether genome-wide SNPs or iron metabolism SNPs interacted with dietary heme iron intake in relation to T2D, testing for associations in each cohort separately and then meta-analyzing to pool the results. Finally, we created 1,000 synthetic pathways matched to an iron metabolism pathway on number of genes, and number of SNPs in each gene. We compared the iron metabolic pathway SNPs with these synthetic SNP assemblies in their relation to T2D to assess if the pathway as a whole interacts with dietary heme iron intake. Results: Using a genomic approach, we found no significant gene–environment interactions with dietary heme iron intake in relation to T2D at a Bonferroni corrected genome-wide significance level of 7.33×1087.33 ×10^{-8} (top SNP in pooled analysis: intergenic rs10980508; p=1.03×106p = 1.03 × 10^{-6}). Furthermore, no SNP in the iron metabolic pathway significantly interacted with dietary heme iron intake at a Bonferroni corrected significance level of 2.10×1042.10 × 10^{-4} (top SNP in pooled analysis: rs1805313; p=1.14×103p = 1.14 × 10^{-3}). Finally, neither the main genetic effects (pooled empirical p by SNP = 0.41), nor gene – dietary heme–iron interactions (pooled empirical p-value for the interactions = 0.72) were significant for the iron metabolic pathway as a whole. Conclusions: We found no significant interactions between dietary heme iron intake and common SNPs in relation to T2D

    lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals.

    Get PDF
    BACKGROUND: Quantitative trait locus (QTL) mapping in genetic data often involves analysis of correlated observations, which need to be accounted for to avoid false association signals. This is commonly performed by modeling such correlations as random effects in linear mixed models (LMMs). The R package lme4 is a well-established tool that implements major LMM features using sparse matrix methods; however, it is not fully adapted for QTL mapping association and linkage studies. In particular, two LMM features are lacking in the base version of lme4: the definition of random effects by custom covariance matrices; and parameter constraints, which are essential in advanced QTL models. Apart from applications in linkage studies of related individuals, such functionalities are of high interest for association studies in situations where multiple covariance matrices need to be modeled, a scenario not covered by many genome-wide association study (GWAS) software. RESULTS: To address the aforementioned limitations, we developed a new R package lme4qtl as an extension of lme4. First, lme4qtl contributes new models for genetic studies within a single tool integrated with lme4 and its companion packages. Second, lme4qtl offers a flexible framework for scenarios with multiple levels of relatedness and becomes efficient when covariance matrices are sparse. We showed the value of our package using real family-based data in the Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) project. CONCLUSIONS: Our software lme4qtl enables QTL mapping models with a versatile structure of random effects and efficient computation for sparse covariances. lme4qtl is available at https://github.com/variani/lme4qtl

    An Approach to Identify Gene-Environment interactions and Reveal New Biological insight in Complex Traits

    Get PDF
    There is a long-standing debate about the magnitude of the contribution of gene-environment interactions to phenotypic variations of complex traits owing to the low statistical power and few reported interactions to date. to address this issue, the Gene-Lifestyle Interactions Working Group within the Cohorts for Heart and Aging Research in Genetic Epidemiology Consortium has been spearheading efforts to investigate G × E in large and diverse samples through meta-analysis. Here, we present a powerful new approach to screen for interactions across the genome, an approach that shares substantial similarity to the Mendelian randomization framework. We identify and confirm 5 loci (6 independent signals) interacted with either cigarette smoking or alcohol consumption for serum lipids, and empirically demonstrate that interaction and mediation are the major contributors to genetic effect size heterogeneity across populations. The estimated lower bound of the interaction and environmentally mediated heritability is significant (P \u3c 0.02) for low-density lipoprotein cholesterol and triglycerides in Cross-Population data. Our study improves the understanding of the genetic architecture and environmental contributions to complex traits

    Screening for interaction effects in gene expression data

    Get PDF
    Expression quantitative trait (eQTL) studies are a powerful tool for identifying genetic variants that affect levels of messenger RNA. Since gene expression is controlled by a complex network of gene-regulating factors, one way to identify these factors is to search for interaction effects between genetic variants and mRNA levels of transcription factors (TFs) and their respective target genes. However, identification of interaction effects in gene expression data pose a variety of methodological challenges, and it has become clear that such analyses should be conducted and interpreted with caution. Investigating the validity and interpretability of several interaction tests when screening for eQTL SNPs whose effect on the target gene expression is modified by the expression level of a transcription factor, we characterized two important methodological issues. First, we stress the scale-dependency of interaction effects and highlight that commonly applied transformation of gene expression data can induce or remove interactions, making interpretation of results more challenging. We then demonstrate that, in the setting of moderate to strong interaction effects on the order of what may be reasonably expected for eQTL studies, standard interaction screening can be biased due to heteroscedasticity induced by true interactions. Using simulation and real data analysis, we outline a set of reasonable minimum conditions and sample size requirements for reliable detection of variant-by-environment and variant-by-TF interactions using the heteroscedasticity consistent covariance-based approach

    Fungal microbiota dysbiosis in IBD.

    Get PDF
    International audienceThe bacterial intestinal microbiota plays major roles in human physiology and IBDs. Although some data suggest a role of the fungal microbiota in IBD pathogenesis, the available data are scarce. The aim of our study was to characterise the faecal fungal microbiota in patients with IBD. Bacterial and fungal composition of the faecal microbiota of 235 patients with IBD and 38 healthy subjects (HS) was determined using 16S and ITS2 sequencing, respectively. The obtained sequences were analysed using the Qiime pipeline to assess composition and diversity. Bacterial and fungal taxa associated with clinical parameters were identified using multivariate association with linear models. Correlation between bacterial and fungal microbiota was investigated using Spearman's test and distance correlation. We observed that fungal microbiota is skewed in IBD, with an increased Basidiomycota/Ascomycota ratio, a decreased proportion of Saccharomyces cerevisiae and an increased proportion of Candida albicans compared with HS. We also identified disease-specific alterations in diversity, indicating that a Crohn's disease-specific gut environment may favour fungi at the expense of bacteria. The concomitant analysis of bacterial and fungal microbiota showed a dense and homogenous correlation network in HS but a dramatically unbalanced network in IBD, suggesting the existence of disease-specific inter-kingdom alterations. Besides bacterial dysbiosis, our study identifies a distinct fungal microbiota dysbiosis in IBD characterised by alterations in biodiversity and composition. Moreover, we unravel here disease-specific inter-kingdom network alterations in IBD, suggesting that, beyond bacteria, fungi might also play a role in IBD pathogenesis

    The association between serum lipids and intraocular pressure in two large UK cohorts

    Get PDF
    PURPOSE: Serum lipids are modifiable, routinely collected blood tests associated with cardiovascular health. We examined the association of commonly collected serum lipid measures (total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein (LDL-C) and triglycerides (TG)) with intraocular pressure (IOP). DESIGN: Cross-sectional study in the UK Biobank and EPIC-Norfolk cohorts. PARTICIPANTS: We included 94 323 participants of UK Biobank (mean age 57 years) and 6 230 participants of EPIC-Norfolk (mean age 68 years) with data on TC, HDL-C, LDL-C, TG collected between 2006-2009. METHODS: Multivariable linear regression adjusting for demographic, lifestyle, anthropometric, medical and ophthalmic covariables was used to examine the associations of serum lipids with IOPcc. MAIN OUTCOME MEASURES: IOPcc. RESULTS: Higher levels of TC, HDL-C and LDL-C were independently associated with higher IOPcc in both cohorts after adjustment for key demographic, medical and lifestyle factors. For each standard deviation increase in TC, HDL-C, and LDL-C, IOPcc (mmHg) was higher by 0.09 (95% CI: 0.06-0.11; P<0.001), 0.11 (95% CI 0.08-0.13; P<0.001), 0.07 (95% CI: 0.05-0.09, P<0.001), respectively in the UK Biobank cohort. In the EPIC-Norfolk cohort, each additional standard deviation in TC, HDL-C, and LDL-C was associated with a higher IOPcc (mmHg) by 0.19 (95% CI 0.07-0.31, P=0.001), 0.14 (95% CI 0.03-0.25, P=0.016), and 0.17 (95% CI 0.06-0.29, P=0.003). An inverse association between TGs and IOP in the UK Biobank (-0.05, 95% CI -0.08 to -0.03, P<0.001) was not replicated in the EPIC cohort (P=0.30). CONCLUSION: Our findings suggest that serum TC, HDL-C and LDL-C are positively associated with IOP in two UK cohorts and TGs may be negatively associated. Future research is required to assess whether these associations are causal in nature
    corecore