71 research outputs found
Testing gene-environment interactions in gene-based association studies
Gene-based and single-nucleotide polymorphism (SNP) set association studies provide an important complement to SNP analysis. Kernel-based nonparametric regression has recently emerged as a powerful and flexible tool for this purpose. Our goal is to explore whether this approach can be extended to incorporate and test for interaction effects, especially for genes containing rare variant SNPs. Here, we construct nonparametric regression models that can be used to include a gene-environment interaction effect under the framework of the least-squares kernel machine and examine the performance of the proposed method on the Genetic Analysis Workshop 17 unrelated individuals data set. Two hundred simulated replicates were used to explore the power for detecting interaction. We demonstrate through a genome scan of the quantitative phenotype Q1 that the simulated gene-environment interaction effect in the data can be detected with reasonable power by using the least-squares kernel machine method
Discovering joint associations between disease and gene pairs with a novel similarity test
Genes in a functional pathway can have complex interactions. A gene might activate or suppress another gene, so it is of interest to test joint associations of gene pairs. To simultaneously detect the joint association between disease and two genes (or two chromosomal regions), we propose a new test with the use of genomic similarities. Our test is designed to detect epistasis in the absence of main effects, main effects in the absence of epistasis, or the presence of both main effects and epistasis. Results: The simulation results show that our similarity test with the matching measure is more powerful than the Pearson's chi(2) test when the disease mutants were introduced at common haplotypes, but is less powerful when the disease mutants were introduced at rare haplotypes. Our similarity tests with the counting measures are more sensitive to marker informativity and linkage disequilibrium patterns, and thus are often inferior to the similarity test with the matching measure and the Pearson 's chi(2) test. Conclusions: In detecting joint associations between disease and gene pairs, our similarity test is a complementary method to the Pearson's chi(2) test
A Markov blanket-based method for detecting causal SNPs in GWAS
<p>Abstract</p> <p>Background</p> <p>Detecting epistatic interactions associated with complex and common diseases can help to improve prevention, diagnosis and treatment of these diseases. With the development of genome-wide association studies (GWAS), designing powerful and robust computational method for identifying epistatic interactions associated with common diseases becomes a great challenge to bioinformatics society, because the study of epistatic interactions often deals with the large size of the genotyped data and the huge amount of combinations of all the possible genetic factors. Most existing computational detection methods are based on the classification capacity of SNP sets, which may fail to identify SNP sets that are strongly associated with the diseases and introduce a lot of false positives. In addition, most methods are not suitable for genome-wide scale studies due to their computational complexity.</p> <p>Results</p> <p>We propose a new Markov Blanket-based method, DASSO-MB (Detection of ASSOciations using Markov Blanket) to detect epistatic interactions in case-control GWAS. Markov blanket of a target variable T can completely shield T from all other variables. Thus, we can guarantee that the SNP set detected by DASSO-MB has a strong association with diseases and contains fewest false positives. Furthermore, DASSO-MB uses a heuristic search strategy by calculating the association between variables to avoid the time-consuming training process as in other machine-learning methods. We apply our algorithm to simulated datasets and a real case-control dataset. We compare DASSO-MB to other commonly-used methods and show that our method significantly outperforms other methods and is capable of finding SNPs strongly associated with diseases.</p> <p>Conclusions</p> <p>Our study shows that DASSO-MB can identify a minimal set of causal SNPs associated with diseases, which contains less false positives compared to other existing methods. Given the huge size of genomic dataset produced by GWAS, this is critical in saving the potential costs of biological experiments and being an efficient guideline for pathogenesis research.</p
Malaria is a cause of iron deficiency in African children
Malaria and iron deficiency (ID) are common and interrelated public health problems in African children. Observational data suggest that interrupting malaria transmission reduces the prevalence of ID1. To test the hypothesis that malaria might cause ID, we used sickle cell trait (HbAS, rs334), a genetic variant that confers specific protection against malaria2, as an instrumental variable in Mendelian randomization analyses. HbAS was associated with a 30% reduction in ID among children living in malaria-endemic countries in Africa (n = 7,453), but not among individuals living in malaria-free areas (n = 3,818). Genetically predicted malaria risk was associated with an odds ratio of 2.65 for ID per unit increase in the log incidence rate of malaria. This suggests that an intervention that halves the risk of malaria episodes would reduce the prevalence of ID in African children by 49%
Modifier Effects between Regulatory and Protein-Coding Variation
Genome-wide associations have shown a lot of promise in dissecting the genetics of complex traits in humans with single variants, yet a large fraction of the genetic effects is still unaccounted for. Analyzing genetic interactions between variants (epistasis) is one of the potential ways forward. We investigated the abundance and functional impact of a specific type of epistasis, namely the interaction between regulatory and protein-coding variants. Using genotype and gene expression data from the 210 unrelated individuals of the original four HapMap populations, we have explored the combined effects of regulatory and protein-coding single nucleotide polymorphisms (SNPs). We predict that about 18% (1,502 out of 8,233 nsSNPs) of protein-coding variants are differentially expressed among individuals and demonstrate that regulatory variants can modify the functional effect of a coding variant in cis. Furthermore, we show that such interactions in cis can affect the expression of downstream targets of the gene containing the protein-coding SNP. In this way, a cis interaction between regulatory and protein-coding variants has a trans impact on gene expression. Given the abundance of both types of variants in human populations, we propose that joint consideration of regulatory and protein-coding variants may reveal additional genetic effects underlying complex traits and disease and may shed light on causes of differential penetrance of known disease variants
A genetic ensemble approach for gene-gene interaction identification
<p>Abstract</p> <p>Background</p> <p>It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging.</p> <p>Methods</p> <p>In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA) and an ensemble of classifiers (called genetic ensemble). Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP) subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise <it>double fault </it>is designed to quantify the degree of complementarity.</p> <p>Conclusions</p> <p>Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR) and is slightly better than Polymorphism Interaction Analysis (PIA), which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of combining identification results from different algorithms.</p
Neural networks for modeling gene-gene interactions in association studies
<p>Abstract</p> <p>Background</p> <p>Our aim is to investigate the ability of neural networks to model different two-locus disease models. We conduct a simulation study to compare neural networks with two standard methods, namely logistic regression models and multifactor dimensionality reduction. One hundred data sets are generated for each of six two-locus disease models, which are considered in a low and in a high risk scenario. Two models represent independence, one is a multiplicative model, and three models are epistatic. For each data set, six neural networks (with up to five hidden neurons) and five logistic regression models (the null model, three main effect models, and the full model) with two different codings for the genotype information are fitted. Additionally, the multifactor dimensionality reduction approach is applied.</p> <p>Results</p> <p>The results show that neural networks are more successful in modeling the structure of the underlying disease model than logistic regression models in most of the investigated situations. In our simulation study, neither logistic regression nor multifactor dimensionality reduction are able to correctly identify biological interaction.</p> <p>Conclusions</p> <p>Neural networks are a promising tool to handle complex data situations. However, further research is necessary concerning the interpretation of their parameters.</p
A two-stage association study identifies methyl-CpG-binding domain protein 2 gene polymorphisms as candidates for breast cancer susceptibility
Genome-wide association studies for breast cancer have identified over 40 single-nucleotide polymorphisms (SNPs), a subset of which remains statistically significant after genome-wide correction. Improved strategies for mining of genome-wide association data have been suggested to address heritable component of genetic risk in breast cancer. In this study, we attempted a two-stage association design using markers from a genome-wide study (stage 1, Affymetrix Human SNP 6.0 array, cases=302, controls=321). We restricted our analysis to DNA repair/modifications/metabolism pathway related gene polymorphisms for their obvious role in carcinogenesis in general and for their known protein–protein interactions vis-à-vis, potential epistatic effects. We selected 22 SNPs based on linkage disequilibrium patterns and high statistical significance. Genotyping assays in an independent replication study of 1178 cases and 1314 controls were attempted using Sequenom iPLEX Gold platform (stage 2). Six SNPs (rs8094493, rs4041245, rs7614, rs13250873, rs1556459 and rs2297381) showed consistent and statistically significant associations with breast cancer risk in both stages, with allelic odds ratios (and P-values) of 0.85 (0.0021), 0.86 (0.0026), 0.86 (0.0041), 1.17 (0.0043), 1.20 (0.0103) and 1.13 (0.0154), respectively, in combined analysis (N=3115). Of these, three polymorphisms were located in methyl-CpG-binding domain protein 2 gene regions and were in strong linkage disequilibrium. The remaining three SNPs were in proximity to RAD21 homolog (S. pombe), O-6-methylguanine-DNA methyltransferase and RNA polymerase II-associated protein 1. The identified markers may be relevant to breast cancer susceptibility in populations if these findings are confirmed in independent cohorts
- …