19 research outputs found

    Resampling procedures to identify important SNPs using a consensus approach

    Get PDF
    Our goal is to identify common single-nucleotide polymorphisms (SNPs) (minor allele frequency > 1%) that add predictive accuracy above that gained by knowledge of easily measured clinical variables. We take an algorithmic approach to predict each phenotypic variable using a combination of phenotypic and genotypic predictors. We perform our procedure on the first simulated replicate and then validate against the others. Our procedure performs well when predicting Q1 but is less successful for the other outcomes. We use resampling procedures where possible to guard against false positives and to improve generalizability. The approach is based on finding a consensus regarding important SNPs by applying random forests and the least absolute shrinkage and selection operator (LASSO) on multiple subsamples. Random forests are used first to discard unimportant predictors, narrowing our focus to roughly 100 important SNPs. A cross-validation LASSO is then used to further select variables. We combine these procedures to guarantee that cross-validation can be used to choose a shrinkage parameter for the LASSO. If the clinical variables were unavailable, this prefiltering step would be essential. We perform the SNP-based analyses simultaneously rather than one at a time to estimate SNP effects in the presence of other causal variants. We analyzed the first simulated replicate of Genetic Analysis Workshop 17 without knowledge of the true model. Post-conference knowledge of the simulation parameters allowed us to investigate the limitations of our approach. We found that many of the false positives we identified were substantially correlated with genuine causal SNPs

    LASSO model selection with post-processing for a genome-wide association study data set

    Get PDF
    Model selection procedures for simultaneous analysis of all single-nucleotide polymorphisms in genome-wide association studies are most suitable for making full use of the data for a complex disease study. In this paper we consider a penalized regression using the LASSO procedure and show that post-processing of the penalized-regression results with subsequent stepwise selection may lead to improved identification of causal single-nucleotide polymorphisms

    Large-Scale Imputation of KIR Copy Number and HLA Alleles in North American and European Psoriasis Case-Control Cohorts Reveals Association of Inhibitory KIR2DL2 With Psoriasis

    Get PDF
    Killer cell immunoglobulin-like receptors (KIR) regulate immune responses in NK and CD8+ T cells via interaction with HLA ligands. KIR genes, including KIR2DS1, KIR3DL1, and KIR3DS1 have previously been implicated in psoriasis susceptibility. However, these previous studies were constrained to small sample sizes, in part due to the time and expense required for direct genotyping of KIR genes. Here, we implemented KIR*IMP to impute KIR copy number from single-nucleotide polymorphisms (SNPs) on chromosome 19 in the discovery cohort (n=11,912) from the PAGE consortium, University of California San Francisco, and the University of Dundee, and in a replication cohort (n=66,357) from Kaiser Permanente Northern California. Stratified multivariate logistic regression that accounted for patient ancestry and high-risk HLA alleles revealed that KIR2DL2 copy number was significantly associated with psoriasis in the discovery cohort (p ≤ 0.05). The KIR2DL2 copy number association was replicated in the Kaiser Permanente replication cohort. This is the first reported association of KIR2DL2 copy number with psoriasis and highlights the importance of KIR genetics in the pathogenesis of psoriasis

    Genome-wide association study of eosinophilic granulomatosis with polyangiitis reveals genomic loci stratified by ANCA status

    Get PDF
    Abstract: Eosinophilic granulomatosis with polyangiitis (EGPA) is a rare inflammatory disease of unknown cause. 30% of patients have anti-neutrophil cytoplasmic antibodies (ANCA) specific for myeloperoxidase (MPO). Here, we describe a genome-wide association study in 676 EGPA cases and 6809 controls, that identifies 4 EGPA-associated loci through conventional case-control analysis, and 4 additional associations through a conditional false discovery rate approach. Many variants are also associated with asthma and six are associated with eosinophil count in the general population. Through Mendelian randomisation, we show that a primary tendency to eosinophilia contributes to EGPA susceptibility. Stratification by ANCA reveals that EGPA comprises two genetically and clinically distinct syndromes. MPO+ ANCA EGPA is an eosinophilic autoimmune disease sharing certain clinical features and an HLA-DQ association with MPO+ ANCA-associated vasculitis, while ANCA-negative EGPA may instead have a mucosal/barrier dysfunction origin. Four candidate genes are targets of therapies in development, supporting their exploration in EGPA

    Model selection procedures for high dimensional genomic data

    Get PDF
    Many complex diseases are thought to be caused by multiple genetic variants. Recent advances in genotyping technology allowed investigators of a complex disease to obtain data for a massive number of candidate genetic variants. Typically each candidate variant is tested individually for an association with the disease. We approach the problem as one of model selection for high dimensional data. We propose a method whereby penalised maximum likelihood estimation provides a reasonably sized set of variants for inclusion in our model. We then perform stepwise regression on this set of variants to arrive at our model. Penalised maximum likelihood estimation is performed with both the lasso and a more recently developed method known as the hyperlasso, with smoothing parameters chosen by cross-validation. The hyperlasso has a penalty function that favours sparser solutions but with less shrinkage of those variables that are included in the model, when compared to the lasso; however, this comes at extra computational cost. We apply the above method to a large genomic data set from a previously published mice obesity study and use resample model averaging to assess model performance. References Kristin A. Ayers and Heather J. Cordell. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genetic Epidemiology, 38:879--891, 2010. doi:10.1002/gepi.20543 David J. Balding. A tutorial on statistical methods for population association studies. Nature Reviews Genetics, 7:781--791, 2006. doi:10.1038/nrg1916 Christopher S. Carlson, Michael A. Eberle, Mark J. Rieder, Qian Yi, Leonid Kruglyak, and Deborah A. Nickerson. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet., 74:106--120, 2004. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1181897/?tool=pubmed Seoae Cho, Kyunga Kim, Young Jin Kim, Jong-Keuk Lee, Yoon Shin Cho, Jong-Young Lee, Bok-Ghee Han, Heebal Kim, Jurg Ott, and Taesung Park. Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis. Annals of Human Genetics, 74:416--428, 2010. doi:10.1111/j.1469-1809.2010.00597.x {European Bioinformatics Institute}. http://www.ebi.ac.uk/projects/BARGEN/. Jianqing Fan and Jinchi Lv. A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20:101--148, 2010. http://www3.stat.sinica.edu.tw/statistica/j20n1/J20N12/J20N12.html Anatole Ghazalpour, Sudheer Doss, Bin Zhang, Susanna Wang, Christopher Plaisier, Ruth Castellanos, Alec Brozell, Eric E. Schadt, Thomas A. Drake, Aldons J. Lusis, and Steve Horvath. Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS Genetics, 2:e130, 2006. I. Gradshteyn and I. Ryzik. Tables of Integrals, Series and Products: Corrected and Enlarged Edition. Academic Press, New York, 1980. J. E. Griffin and P. J. Brown. Bayesian adaptive lassos with non-convex penalization. Technical report, University of Kent, 2007. http://www2.warwick.ac.uk/fac/sci/statistics/crism/research/working_papers/2007/paper07-2/07-2wv2.pdf Clive J. Hoggart, John C. Whittaker, Maria {De Iorio}, and David J. Balding. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genetics, 4:e1000130, 2008. doi:10.1371/journal.pgen.1000130 B. Maher. Personal genomes: The case of the missing heritability. Nature, 456:18--21, 2008. doi:10.1038/456018a T. A. Manolio et al. Finding the missing heritability of complex diseases. Nature, 461:747--753, 2009. doi:10.1038/nature08494 Mark I. McCarthy, Goncalo R. Abecasis, Lon R. Cardon, David B. Goldstein, Julian Little, John P. A. Ioannidis, and Joel N. Hirschhorn. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics, 9:356--369, 2008. doi:10.1038/nrg2344 Nicolai Meinshausen and Peter Buehlmann. Stability selection. Journal of the Royal Statistical Society, Series B, 72:417--473, 2010. doi:10.1111/j.1467-9868.2010.00740.x R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. ISBN 3-900051-07-0. http://www.r-project.org/ R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267--288, 1996. http://www.jstor.org/stable/2346178 William Valdar, Christopher C. Holmes, Richard Mott, and Jonathan Flint. Mapping in structured populations by resample model averaging. Genetics, 182:1263--1277, 2009. doi:10.1534/genetics.109.100727 Susanna Wang, Nadir Yehya, Eric E. Schadt, Hui Wang, Thomas A. Drake, and Aldons J. Lusis. Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genetics, 2:e15, 2006. doi:10.1371/journal.pgen.0020015 E. T. Whittaker. On the functions associated with the parabolic cylinder in harmonic analysis. Proc. London Math. Soc., 35:417--427, 1902. doi:10.1112/plms/s1-35.1.417 Jian Yang, Beben Benyamin, Brian P. McEvoy, Scott Gordon, Anjali K. Henders, Dale R. Nyholt, et al. Common {SNPs} explain a large proportion of the heritability for human height. Nature Genetics, 42:565--569, 2010. doi:10.1038/ng.608 Gang Zheng, Jonathan Marchini, and Nancy L. Geller. Introduction to the special issue: Genome-wide association studies. Statistical Science, 24:387, 2009. doi:10.1214/09-STS31
    corecore