216 research outputs found

    A Generalized Approach for Testing the Association of a Set of Predictors with an Outcome: A Gene Based Test

    Get PDF
    In many analyses, one has data on one level but desires to draw inference on another level. For example, in genetic association studies, one observes units of DNA referred to as SNPs, but wants to determine whether genes that are comprised of SNPs are associated with disease. While there are some available approaches for addressing this issue, they usually involve making parametric assumptions and are not easily generalizable. A statistical test is proposed for testing the association of a set of variables with an outcome of interest. No assumptions are made about the functional form relating the variables to the outcome. A general function is fit using any statistical learning algorithm, with the SuperLearner algorithm suggested. The parameter of interest is the cross-validated risk and this is compared to an expected risk. A Wald test is proposed using the influence curve of the cross-validated risk to obtain the variance. It is shown both theoretically and via simulation that the test maintains appropriate type I error control and is more powerful than parametric tests under more general alternatives. The test is applied to an MS candidate gene study. Three separate analyses are performed highlighting the flexibility of the approach

    An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As computational power improves, the application of more advanced machine learning techniques to the analysis of large genome-wide association (GWA) datasets becomes possible. While most traditional statistical methods can only elucidate main effects of genetic variants on risk for disease, certain machine learning approaches are particularly suited to discover higher order and non-linear effects. One such approach is the Random Forests (RF) algorithm. The use of RF for SNP discovery related to human disease has grown in recent years; however, most work has focused on small datasets or simulation studies which are limited.</p> <p>Results</p> <p>Using a multiple sclerosis (MS) case-control dataset comprised of 300 K SNP genotypes across the genome, we outline an approach and some considerations for optimally tuning the RF algorithm based on the empirical dataset. Importantly, results show that typical default parameter values are not appropriate for large GWA datasets. Furthermore, gains can be made by sub-sampling the data, pruning based on linkage disequilibrium (LD), and removing strong effects from RF analyses. The new RF results are compared to findings from the original MS GWA study and demonstrate overlap. In addition, four new interesting candidate MS genes are identified, <it>MPHOSPH9, CTNNA3, PHACTR2 </it>and <it>IL7</it>, by RF analysis and warrant further follow-up in independent studies.</p> <p>Conclusions</p> <p>This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm. It is shown that RF is computationally feasible for GWA data and the results obtained make biologic sense based on previous studies. More importantly, new genes were identified as potentially being associated with MS, suggesting new avenues of investigation for this complex disease.</p

    Genetic variants in ARID5B and CEBPE are childhood ALL susceptibility loci in Hispanics.

    Get PDF
    Recent genome-wide studies conducted in European Whites have identified novel susceptibility genes for childhood acute lymphoblastic leukemia (ALL). We sought to examine whether these loci are susceptibility genes among Hispanics, whose reported incidence of childhood ALL is the highest of all ethnic groups in California, and whether their effects differ between Hispanics and non-Hispanic Whites (NHWs). We genotyped 13 variants in these genes among 706 Hispanic (300 cases, 406 controls) and 594 NHW (225 cases, 369 controls) participants in a matched population-based case-control study in California. We found significant associations for the five studied ARID5B variants in both Hispanics (p values of 1.0 Ă— 10(-9) to 0.004) and NHWs (p values of 2.2 Ă— 10(-6) to 0.018). Risk estimates were in the same direction in both groups (ORs of 1.53-1.99 and 1.37-1.84, respectively) and strengthened when restricted to B-cell precursor high-hyperdiploid ALL (&gt;50 chromosomes; ORs of 2.21-3.22 and 1.67-2.71, respectively). Similar results were observed for the single CEBPE variant. Hispanics and NHWs exhibited different susceptibility loci at CDKN2A. Although IKZF1 loci showed significant susceptibility effects among NHWs (p &lt; 1 Ă— 10(-5)), their effects among Hispanics were in the same direction but nonsignificant, despite similar minor allele frequencies. Future studies should examine whether the observed effects vary by environmental, immunological, or lifestyle factors

    Mendelian randomization shows a causal effect of low vitamin D on multiple sclerosis risk.

    Get PDF
    ObjectiveWe sought to estimate the causal effect of low serum 25(OH)D on multiple sclerosis (MS) susceptibility that is not confounded by environmental or lifestyle factors or subject to reverse causality.MethodsWe conducted mendelian randomization (MR) analyses using an instrumental variable (IV) comprising 3 single nucleotide polymorphisms found to be associated with serum 25(OH)D levels at genome-wide significance. We analyzed the effect of the IV on MS risk and both age at onset and disease severity in 2 separate populations using logistic regression models that controlled for sex, year of birth, smoking, education, genetic ancestry, body mass index at age 18-20 years or in 20s, a weighted genetic risk score for 110 known MS-associated variants, and the presence of one or more HLA-DRB1*15:01 alleles.ResultsFindings from MR analyses using the IV showed increasing levels of 25(OH)D are associated with a decreased risk of MS in both populations. In white, non-Hispanic members of Kaiser Permanente Northern California (1,056 MS cases and 9,015 controls), the odds ratio (OR) was 0.79 (p = 0.04, 95% confidence interval (CI): 0.64-0.99). In members of a Swedish population from the Epidemiological Investigation of Multiple Sclerosis and Genes and Environment in Multiple Sclerosis MS case-control studies (6,335 cases and 5,762 controls), the OR was 0.86 (p = 0.03, 95% CI: 0.76-0.98). A meta-analysis of the 2 populations gave a combined OR of 0.85 (p = 0.003, 95% CI: 0.76-0.94). No association was observed for age at onset or disease severity.ConclusionsThese results provide strong evidence that low serum 25(OH)D concentration is a cause of MS, independent of established risk factors

    Rheumatoid Arthritis Naive T Cells Share Hypermethylation Sites With Synoviocytes.

    Get PDF
    ObjectiveTo determine whether differentially methylated CpGs in synovium-derived fibroblast-like synoviocytes (FLS) of patients with rheumatoid arthritis (RA) were also differentially methylated in RA peripheral blood (PB) samples.MethodsFor this study, 371 genome-wide DNA methylation profiles were measured using Illumina HumanMethylation450 BeadChips in PB samples from 63 patients with RA and 31 unaffected control subjects, specifically in the cell subsets of CD14+ monocytes, CD19+ B cells, CD4+ memory T cells, and CD4+ naive T cells.ResultsOf 5,532 hypermethylated FLS candidate CpGs, 1,056 were hypermethylated in CD4+ naive T cells from RA PB compared to control PB. In analyses of a second set of CpG candidates based on single-nucleotide polymorphisms from a genome-wide association study of RA, 1 significantly hypermethylated CpG in CD4+ memory T cells and 18 significant CpGs (6 hypomethylated, 12 hypermethylated) in CD4+ naive T cells were found. A prediction score based on the hypermethylated FLS candidates had an area under the curve of 0.73 for association with RA case status, which compared favorably to the association of RA with the HLA-DRB1 shared epitope risk allele and with a validated RA genetic risk score.ConclusionFLS-representative DNA methylation signatures derived from the PB may prove to be valuable biomarkers for the risk of RA or for disease status
    • …
    corecore