37 research outputs found

    Mutual Information for Testing Gene-Environment Interaction

    Get PDF
    Despite current enthusiasm for investigation of gene-gene interactions and gene-environment interactions, the essential issue of how to define and detect gene-environment interactions remains unresolved. In this report, we define gene-environment interactions as a stochastic dependence in the context of the effects of the genetic and environmental risk factors on the cause of phenotypic variation among individuals. We use mutual information that is widely used in communication and complex system analysis to measure gene-environment interactions. We investigate how gene-environment interactions generate the large difference in the information measure of gene-environment interactions between the general population and a diseased population, which motives us to develop mutual information-based statistics for testing gene-environment interactions. We validated the null distribution and calculated the type 1 error rates for the mutual information-based statistics to test gene-environment interactions using extensive simulation studies. We found that the new test statistics were more powerful than the traditional logistic regression under several disease models. Finally, in order to further evaluate the performance of our new method, we applied the mutual information-based statistics to three real examples. Our results showed that P-values for the mutual information-based statistics were much smaller than that obtained by other approaches including logistic regression models

    Examination of polymorphic glutathione S-transferase (GST) genes, tobacco smoking and prostate cancer risk among Men of African Descent: A case-control study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Polymorphisms in <it>glutathione S-transferase </it>(GST) genes may influence response to oxidative stress and modify prostate cancer (PCA) susceptibility. These enzymes generally detoxify endogenous and exogenous agents, but also participate in the activation and inactivation of oxidative metabolites that may contribute to PCA development. Genetic variations within selected <it>GST </it>genes may influence PCA risk following exposure to carcinogen compounds found in cigarette smoke and decreased the ability to detoxify them. Thus, we evaluated the effects of polymorphic <it>GSTs </it>(<it>M1</it>, <it>T1</it>, and <it>P1</it>) alone and combined with cigarette smoking on PCA susceptibility.</p> <p>Methods</p> <p>In order to evaluate the effects of <it>GST </it>polymorphisms in relation to PCA risk, we used TaqMan allelic discrimination assays along with a multi-faceted statistical strategy involving conventional and advanced statistical methodologies (e.g., Multifactor Dimensionality Reduction and Interaction Graphs). Genetic profiles collected from 873 men of African-descent (208 cases and 665 controls) were utilized to systematically evaluate the single and joint modifying effects of <it>GSTM1 </it>and <it>GSTT1 </it>gene deletions, <it>GSTP1 </it>105 Val and cigarette smoking on PCA risk.</p> <p>Results</p> <p>We observed a moderately significant association between risk among men possessing at least one variant <it>GSTP1 </it>105 Val allele (OR = 1.56; 95%CI = 0.95-2.58; p = 0.049), which was confirmed by MDR permutation testing (p = 0.001). We did not observe any significant single gene effects among <it>GSTM1 </it>(OR = 1.08; 95%CI = 0.65-1.82; p = 0.718) and <it>GSTT1 </it>(OR = 1.15; 95%CI = 0.66-2.02; p = 0.622) on PCA risk among all subjects. Although the <it>GSTM1</it>-<it>GSTP1 </it>pairwise combination was selected as the best two factor LR and MDR models (p = 0.01), assessment of the hierarchical entropy graph suggested that the observed synergistic effect was primarily driven by the <it>GSTP1 </it>Val marker. Notably, the <it>GSTM1</it>-<it>GSTP1 </it>axis did not provide additional information gain when compared to either loci alone based on a hierarchical entropy algorithm and graph. Smoking status did not significantly modify the relationship between the <it>GST </it>SNPs and PCA.</p> <p>Conclusion</p> <p>A moderately significant association was observed between PCA risk and men possessing at least one variant <it>GSTP1 </it>105 Val allele (p = 0.049) among men of African descent. We also observed a 2.1-fold increase in PCA risk associated with men possessing the <it>GSTP1 </it>(Val/Val) and <it>GSTM1 </it>(*1/*1 + *1/*0) alleles. MDR analysis validated these findings; detecting <it>GSTP1 </it>105 Val (p = 0.001) as the best single factor for predicting PCA risk. Our findings emphasize the importance of utilizing a combination of traditional and advanced statistical tools to identify and validate single gene and multi-locus interactions in relation to cancer susceptibility.</p

    Interaction among apoptosis-associated sequence variants and joint effects on aggressive prostate cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Molecular and epidemiological evidence demonstrate that altered gene expression and single nucleotide polymorphisms in the apoptotic pathway are linked to many cancers. Yet, few studies emphasize the interaction of variant apoptotic genes and their joint modifying effects on prostate cancer (PCA) outcomes. An exhaustive assessment of all the possible two-, three- and four-way gene-gene interactions is computationally burdensome. This statistical conundrum stems from the prohibitive amount of data needed to account for multiple hypothesis testing.</p> <p>Methods</p> <p>To address this issue, we systematically prioritized and evaluated individual effects and complex interactions among 172 apoptotic SNPs in relation to PCA risk and aggressive disease (i.e., Gleason score ≥ 7 and tumor stages III/IV). Single and joint modifying effects on PCA outcomes among European-American men were analyzed using statistical epistasis networks coupled with multi-factor dimensionality reduction (SEN-guided MDR). The case-control study design included 1,175 incident PCA cases and 1,111 controls from the prostate, lung, colo-rectal, and ovarian (PLCO) cancer screening trial. Moreover, a subset analysis of PCA cases consisted of 688 aggressive and 488 non-aggressive PCA cases. SNP profiles were obtained using the NCI Cancer Genetic Markers of Susceptibility (CGEMS) data portal. Main effects were assessed using logistic regression (LR) models. Prior to modeling interactions, SEN was used to pre-process our genetic data. SEN used network science to reduce our analysis from > 36 million to < 13,000 SNP interactions. Interactions were visualized, evaluated, and validated using entropy-based MDR. All parametric and non-parametric models were adjusted for age, family history of PCA, and multiple hypothesis testing.</p> <p>Results</p> <p>Following LR modeling, eleven and thirteen sequence variants were associated with PCA risk and aggressive disease, respectively. However, none of these markers remained significant after we adjusted for multiple comparisons. Nevertheless, we detected a modest synergistic interaction between <it>AKT3 rs2125230-PRKCQ rs571715 </it>and disease aggressiveness using SEN-guided MDR (p = 0.011).</p> <p>Conclusions</p> <p>In summary, entropy-based SEN-guided MDR facilitated the logical prioritization and evaluation of apoptotic SNPs in relation to aggressive PCA. The suggestive interaction between <it>AKT3-PRKCQ </it>and aggressive PCA requires further validation using independent observational studies.</p

    A role for CETP TaqIB polymorphism in determining susceptibility to atrial fibrillation: a nested case control study

    Get PDF
    BACKGROUND: Studies investigating the genetic and environmental characteristics of atrial fibrillation (AF) may provide new insights in the complex development of AF. We aimed to investigate the association between several environmental factors and loci of candidate genes, which might be related to the presence of AF. METHODS: A nested case-control study within the PREVEND cohort was conducted. Standard 12 lead electrocardiograms were recorded and AF was defined according to Minnesota codes. For every case, an age and gender matched control was selected from the same population (n = 194). In addition to logistic regression analyses, the multifactor-dimensionality reduction (MDR) method and interaction entropy graphs were used for the evaluation of gene-gene and gene-environment interactions. Polymorphisms in genes from the Renin-angiotensin, Bradykinin and CETP systems were included. RESULTS: Subjects with AF had a higher prevalence of electrocardiographic left ventricular hypertrophy, ischemic heart disease, hypertension, renal dysfunction, elevated levels of C-reactive protein (CRP) and increased urinary albumin excretion as compared to controls. The polymorphisms of the Renin-angiotensin system and Bradykinin gene did not show a significant association with AF (p > 0.05). The TaqIB polymorphism of the CETP gene was significantly associated with the presence of AF (p < 0.05). Using the MDR method, the best genotype-phenotype models included the combination of micro- or macroalbuminuria and CETP TaqIB polymorphism, CRP >3 mg/L and CETP TaqIB polymorphism, renal dysfunction and the CETP TaqIB polymorphism, and ischemic heart disease and CETP TaqIB polymorphism (1000 fold permutation testing, P < 0.05). Interaction entropy graph showed that the combination of albuminuria and CETP TaqIB polymorphism removed the most entropy. CONCLUSION: CETP TaqIB polymorphism is significantly associated with the presence of AF in the context of micro- or macroalbuminuria, elevated C-reactive protein, renal dysfunction, and ischemic heart disease

    A Novel Statistic for Genome-Wide Interaction Analysis

    Get PDF
    Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001<FDR<0.003, respectively, which were seen in two independent studies of psoriasis. These included five interacting pairs of SNPs in genes LST1/NCR3, CXCR5/BCL9L, and GLS2, some of which were located in the target sites of miR-324-3p, miR-433, and miR-382, as well as 15 pairs of interacting SNPs that had nonsynonymous substitutions. Our results demonstrated that genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies

    Multiple-input multiple-output causal strategies for gene selection

    Get PDF
    Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting.Journal ArticleResearch Support, N.I.H. ExtramuralResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Multiple Analytical Approaches Reveal Distinct Gene-Environment Interactions in Smokers and Non Smokers in Lung Cancer

    Get PDF
    Complex disease such as cancer results from interactions of multiple genetic and environmental factors. Studying these factors singularly cannot explain the underlying pathogenetic mechanism of the disease. Multi-analytical approach, including logistic regression (LR), classification and regression tree (CART) and multifactor dimensionality reduction (MDR), was applied in 188 lung cancer cases and 290 controls to explore high order interactions among xenobiotic metabolizing genes and environmental risk factors. Smoking was identified as the predominant risk factor by all three analytical approaches. Individually, CYP1A1*2A polymorphism was significantly associated with increased lung cancer risk (OR = 1.69;95%CI = 1.11–2.59,p = 0.01), whereas EPHX1 Tyr113His and SULT1A1 Arg213His conferred reduced risk (OR = 0.40;95%CI = 0.25–0.65,p<0.001 and OR = 0.51;95%CI = 0.33–0.78,p = 0.002 respectively). In smokers, EPHX1 Tyr113His and SULT1A1 Arg213His polymorphisms reduced the risk of lung cancer, whereas CYP1A1*2A, CYP1A1*2C and GSTP1 Ile105Val imparted increased risk in non-smokers only. While exploring non-linear interactions through CART analysis, smokers carrying the combination of EPHX1 113TC (Tyr/His), SULT1A1 213GG (Arg/Arg) or AA (His/His) and GSTM1 null genotypes showed the highest risk for lung cancer (OR = 3.73;95%CI = 1.33–10.55,p = 0.006), whereas combined effect of CYP1A1*2A 6235CC or TC, SULT1A1 213GG (Arg/Arg) and betel quid chewing showed maximum risk in non-smokers (OR = 2.93;95%CI = 1.15–7.51,p = 0.01). MDR analysis identified two distinct predictor models for the risk of lung cancer in smokers (tobacco chewing, EPHX1 Tyr113His, and SULT1A1 Arg213His) and non-smokers (CYP1A1*2A, GSTP1 Ile105Val and SULT1A1 Arg213His) with testing balance accuracy (TBA) of 0.6436 and 0.6677 respectively. Interaction entropy interpretations of MDR results showed non-additive interactions of tobacco chewing with SULT1A1 Arg213His and EPHX1 Tyr113His in smokers and SULT1A1 Arg213His with GSTP1 Ile105Val and CYP1A1*2C in nonsmokers. These results identified distinct gene-gene and gene environment interactions in smokers and non-smokers, which confirms the importance of multifactorial interaction in risk assessment of lung cancer

    Neural networks for modeling gene-gene interactions in association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Our aim is to investigate the ability of neural networks to model different two-locus disease models. We conduct a simulation study to compare neural networks with two standard methods, namely logistic regression models and multifactor dimensionality reduction. One hundred data sets are generated for each of six two-locus disease models, which are considered in a low and in a high risk scenario. Two models represent independence, one is a multiplicative model, and three models are epistatic. For each data set, six neural networks (with up to five hidden neurons) and five logistic regression models (the null model, three main effect models, and the full model) with two different codings for the genotype information are fitted. Additionally, the multifactor dimensionality reduction approach is applied.</p> <p>Results</p> <p>The results show that neural networks are more successful in modeling the structure of the underlying disease model than logistic regression models in most of the investigated situations. In our simulation study, neither logistic regression nor multifactor dimensionality reduction are able to correctly identify biological interaction.</p> <p>Conclusions</p> <p>Neural networks are a promising tool to handle complex data situations. However, further research is necessary concerning the interpretation of their parameters.</p

    Applying Discrete PCA in Data Analysis

    No full text
    Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at di#erent levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval
    corecore