63 research outputs found

    Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions

    Get PDF
    Risk of complex disorders is thought to be multifactorial, involving interactions between risk factors. However, many genetic studies assess association between disease status and markers one single-nucleotide polymorphism (SNP) at a time, due to the high-dimensional nature of the search space of all possible interactions. Three ensemble methods have been recently proposed for use in high-dimensional data (Monte Carlo logic regression, random forests, and generalized boosted regression). An intuitive way to detect an association between genetic markers and disease status is to use variable importance measures, even though the stability of these measures in the context of a whole-genome association study is unknown. For the simulated data of Problem 3 in the Genetic Analysis Workshop 15 (GAW15), we examined the variability of both rankings and magnitude of variable importance measures using 10 variables simulated to participate in gene × gene and gene × environment interactions. We conducted 500 analyses per method on one randomly selected replicate, tallying the rankings and importance measures for each of the 10 variables of interest. When the simulated effect size was strong, all three methods showed stable rankings and estimates of variable importance. However, under conditions more commonly expected to be encountered in complex diseases, random forests and generalized boosted regression showed more stable estimates of variable importance and variable rankings. Individuals endeavoring to apply statistical learning methods to detect interaction in complex disease studies should perform repeated analyses in order to assure variable importance measures and rankings do not vary greatly, even for statistical learning algorithms that are thought to be stable

    The behaviour of random forest permutation-based variable importance measures under predictor correlation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results.</p> <p>Results</p> <p>In the case when both predictor correlation was present and predictors were associated with the outcome (H<sub>A</sub>), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H<sub>0</sub>) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under H<sub>A </sub>and was unbiased under H<sub>0</sub>. Scaled VIMs were clearly biased under H<sub>A </sub>and H<sub>0</sub>.</p> <p>Conclusions</p> <p>Unconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. On the other hand, we show examples where this increased importance may result in spurious signals.</p

    Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms

    Get PDF
    Although permutation testing has been the gold standard for assessing significance levels in studies using multiple markers, it is time-consuming. A Bonferroni correction to the nominal p-value that uses the underlying pair-wise linkage disequilibrium (LD) structure among the markers to determine the number of effectively independent tests has recently been proposed. We propose using the number of independent LD blocks plus the number of independent single-nucleotide polymorphisms for correction. Using the Collaborative Study on the Genetics of Alcoholism LD data for chromosome 21, we simulated 1,000 replicates of parent-child trio data under the null hypothesis with two levels of LD: moderate and high. Assuming haplotype blocks were independent, we calculated the number of independent statistical tests using 3 haplotype blocking algorithms. We then compared the type I error rates using a principal components-based method, the three blocking methods, a traditional Bonferroni correction, and the unadjusted p-values obtained from FBAT. Under high LD conditions, the PC method and one of the blocking methods were slightly conservative, whereas the 2 other blocking methods exceeded the target type I error rate. Under conditions of moderate LD, we show that the blocking algorithm corrections are closest to the desired type I error, although still slightly conservative, with the principal components-based method being almost as conservative as the traditional Bonferroni correction

    Genome-wide association study of antidepressant treatment resistance in a population-based cohort using health service prescription data and meta-analysis with GENDEP

    Get PDF
    Antidepressants demonstrate modest response rates in the treatment of major depressive disorder (MDD). Despite previous genome-wide association studies (GWAS) of antidepressant treatment response, the underlying genetic factors are unknown. Using prescription data in a population and family-based cohort (Generation Scotland: Scottish Family Health Study; GS:SFHS), we sought to define a measure of (a) antidepressant treatment resistance and (b) stages of antidepressant resistance by inferring antidepressant switching as non-response to treatment. GWAS were conducted separately for antidepressant treatment resistance in GS:SFHS and the Genome-based Therapeutic Drugs for Depression (GENDEP) study and then meta-analysed (meta-analysis n = 4213, cases = 358). For stages of antidepressant resistance, a GWAS on GS:SFHS only was performed (n = 3452). Additionally, we conducted gene-set enrichment, polygenic risk scoring (PRS) and genetic correlation analysis. We did not identify any significant loci, genes or gene sets associated with antidepressant treatment resistance or stages of resistance. Significant positive genetic correlations of antidepressant treatment resistance and stages of resistance with neuroticism, psychological distress, schizotypy and mood disorder traits were identified. These findings suggest that larger sample sizes are needed to identify the genetic architecture of antidepressant treatment response, and that population-based observational studies may provide a tractable approach to achieving the necessary statistical power

    Functional Polymorphisms in PRODH Are Associated with Risk and Protection for Schizophrenia and Fronto-Striatal Structure and Function

    Get PDF
    PRODH, encoding proline oxidase (POX), has been associated with schizophrenia through linkage, association, and the 22q11 deletion syndrome (Velo-Cardio-Facial syndrome). Here, we show in a family-based sample that functional polymorphisms in PRODH are associated with schizophrenia, with protective and risk alleles having opposite effects on POX activity. Using a multimodal imaging genetics approach, we demonstrate that haplotypes constructed from these risk and protective functional polymorphisms have dissociable correlations with structure, function, and connectivity of striatum and prefrontal cortex, impacting critical circuitry implicated in the pathophysiology of schizophrenia. Specifically, the schizophrenia risk haplotype was associated with decreased striatal volume and increased striatal-frontal functional connectivity, while the protective haplotype was associated with decreased striatal-frontal functional connectivity. Our findings suggest a role for functional genetic variation in POX on neostriatal-frontal circuits mediating risk and protection for schizophrenia

    Age at first birth in women is genetically associated with increased risk of schizophrenia

    Get PDF
    Prof. Paunio on PGC:n jäsenPrevious studies have shown an increased risk for mental health problems in children born to both younger and older parents compared to children of average-aged parents. We previously used a novel design to reveal a latent mechanism of genetic association between schizophrenia and age at first birth in women (AFB). Here, we use independent data from the UK Biobank (N = 38,892) to replicate the finding of an association between predicted genetic risk of schizophrenia and AFB in women, and to estimate the genetic correlation between schizophrenia and AFB in women stratified into younger and older groups. We find evidence for an association between predicted genetic risk of schizophrenia and AFB in women (P-value = 1.12E-05), and we show genetic heterogeneity between younger and older AFB groups (P-value = 3.45E-03). The genetic correlation between schizophrenia and AFB in the younger AFB group is -0.16 (SE = 0.04) while that between schizophrenia and AFB in the older AFB group is 0.14 (SE = 0.08). Our results suggest that early, and perhaps also late, age at first birth in women is associated with increased genetic risk for schizophrenia in the UK Biobank sample. These findings contribute new insights into factors contributing to the complex bio-social risk architecture underpinning the association between parental age and offspring mental health.Peer reviewe

    Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects

    Get PDF
    Copy number variants (CNVs) have been strongly implicated in the genetic etiology of schizophrenia (SCZ). However, genome-wide investigation of the contribution of CNV to risk has been hampered by limited sample sizes. We sought to address this obstacle by applying a centralized analysis pipeline to a SCZ cohort of 21,094 cases and 20,227 controls. A global enrichment of CNV burden was observed in cases (OR=1.11, P=5.7×10−15), which persisted after excluding loci implicated in previous studies (OR=1.07, P=1.7 ×10−6). CNV burden was enriched for genes associated with synaptic function (OR = 1.68, P = 2.8 ×10−11) and neurobehavioral phenotypes in mouse (OR = 1.18, P= 7.3 ×10−5). Genome-wide significant evidence was obtained for eight loci, including 1q21.1, 2p16.3 (NRXN1), 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and 22q11.2. Suggestive support was found for eight additional candidate susceptibility and protective loci, which consisted predominantly of CNVs mediated by non-allelic homologous recombination

    No Reliable Association between Runs of Homozygosity and Schizophrenia in a Well-Powered Replication Study

    Get PDF
    It is well known that inbreeding increases the risk of recessive monogenic diseases, but it is less certain whether it contributes to the etiology of complex diseases such as schizophrenia. One way to estimate the effects of inbreeding is to examine the association between disease diagnosis and genome-wide autozygosity estimated using runs of homozygosity (ROH) in genome-wide single nucleotide polymorphism arrays. Using data for schizophrenia from the Psychiatric Genomics Consortium (n = 21,868), Keller et al. (2012) estimated that the odds of developing schizophrenia increased by approximately 17% for every additional percent of the genome that is autozygous (β = 16.1, CI(β) = [6.93, 25.7], Z = 3.44, p = 0.0006). Here we describe replication results from 22 independent schizophrenia case-control datasets from the Psychiatric Genomics Consortium (n = 39,830). Using the same ROH calling thresholds and procedures as Keller et al. (2012), we were unable to replicate the significant association between ROH burden and schizophrenia in the independent PGC phase II data, although the effect was in the predicted direction, and the combined (original + replication) dataset yielded an attenuated but significant relationship between Froh and schizophrenia (β = 4.86,CI(β) = [0.90,8.83],Z = 2.40,p = 0.02). Since Keller et al. (2012), several studies reported inconsistent association of ROH burden with complex traits, particularly in case-control data. These conflicting results might suggest that the effects of autozygosity are confounded by various factors, such as socioeconomic status, education, urbanicity, and religiosity, which may be associated with both real inbreeding and the outcome measures of interest
    corecore