987 research outputs found

    Multilocus association mapping using generalized ridge logistic regression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In genome-wide association studies, it is widely accepted that multilocus methods are more powerful than testing single-nucleotide polymorphisms (SNPs) one at a time. Among statistical approaches considering many predictors simultaneously, scan statistics are an effective tool for detecting susceptibility genomic regions and mapping disease genes. In this study, inspired by the idea of scan statistics, we propose a novel sliding window-based method for identifying a parsimonious subset of contiguous SNPs that best predict disease status.</p> <p>Results</p> <p>Within each sliding window, we apply a forward model selection procedure using generalized ridge logistic regression for model fitness in each step. In power simulations, we compare the performance of our method with that of five other methods in current use. Averaging power over all the conditions considered, our method dominates the others. We also present two published datasets where our method is useful in causal SNP identification.</p> <p>Conclusions</p> <p>Our method can automatically combine genetic information in local genomic regions and allow for linkage disequilibrium between SNPs. It can overcome some defects of the scan statistics approach and will be very promising in genome-wide case-control association studies.</p

    Multilocus Genetic Analysis of Brain Images

    Get PDF
    The quest to identify genes that influence disease is now being extended to find genes that affect biological markers of disease, or endophenotypes. Brain images, in particular, provide exquisitely detailed measures of anatomy, function, and connectivity in the living brain, and have identified characteristic features for many neurological and psychiatric disorders. The emerging field of imaging genomics is discovering important genetic variants associated with brain structure and function, which in turn influence disease risk and fundamental cognitive processes. Statistical approaches for testing genetic associations are not straightforward to apply to brain images because the data in brain images is spatially complex and generally high dimensional. Neuroimaging phenotypes typically include 3D maps across many points in the brain, fiber tracts, shape-based analyses, and connectivity matrices, or networks. These complex data types require new methods for data reduction and joint consideration of the image and the genome. Image-wide, genome-wide searches are now feasible, but they can be greatly empowered by sparse regression or hierarchical clustering methods that isolate promising features, boosting statistical power. Here we review the evolution of statistical approaches to assess genetic influences on the brain. We outline the current state of multivariate statistics in imaging genomics, and future directions, including meta-analysis. We emphasize the power of novel multivariate approaches to discover reliable genetic influences with small effect sizes

    STATISTICAL METHODS IN GENETIC STUDIES

    Get PDF
    This dissertation includes three Chapters. A brief description of each chapter is organized as follows. In Chapter 1, we proposed a new method, called MF-TOWmuT, for genome-wide association studies with multiple genetic variants and multiple phenotypes using family samples. MF-TOWmuT uses kinship matrix to account for sample relatedness. It is worth mentioning that in simulations, we considered hidden polygenic effects and varied the proportion of variance contributed by it to generate phenotypes. Simulation studies show that MF-TOWmuT can preserve the type I error rates and is more powerful than several existing methods in different simulation scenarios, MFTOWmuT is also quite robust to the proportion of variance explained by invisible polygenic effects and to the direction of effects of genetic variants. In Chapter 2, we proposed a fast and efficient low rank penalized regression with the Elastic Net penalty for the eQTL mapping, called LORSEN. By considering the Elastic Net penalty instead of the L1 penalty, our method can overcome two crucial drawbacks of the L1 penalty, and outperforms two commonly used methods for the eQTL mapping, LORS and FastLORS, in many simulation scenarios in terms of average Area Under the Curve (AUC). In Chapter 3, we proposed a bipartite network-based penalized regression model for the eQTL mapping, called BiNetPeR. This method takes into account the SNPgene marginal association evidence to construct the SNP-gene bipartite network, then uses such a bipartite network to obtain the projected SNP network. Based on the normalized Laplacian matrix of the projected SNP network, we then formulate the eQTL mapping into a penalized regression model. Our simulation results show that our proposed method can maintain the appropriate false positive rate and outperforms two competing methods for the eQTL mapping, FastLORS and mtLasso2G

    A data-driven medication score predicts 10-year mortality among aging adults

    Get PDF
    Health differences among the elderly and the role of medical treatments are topical issues in aging societies. We demonstrate the use of modern statistical learning methods to develop a data-driven health measure based on 21 years of pharmacy purchase and mortality data of 12,047 aging individuals. The resulting score was validated with 33,616 individuals from two fully independent datasets and it is strongly associated with all-cause mortality (HR 1.18 per point increase in score; 95% CI 1.14-1.22; p=2.25e-16). When combined with Charlson comorbidity index, individuals with elevated medication score and comorbidity index had over six times higher risk (HR 6.30; 95% CI 3.84-10.3; AUC=0.802) compared to individuals with a protective score profile. Alone, the medication score performs similarly to the Charlson comorbidity index and is associated with polygenic risk for coronary heart disease and type 2 diabetes.Peer reviewe

    Polygenic risk scores for prediction of breast cancer risk in Asian populations.

    Get PDF
    PURPOSE: Non-European populations are under-represented in genetics studies, hindering clinical implementation of breast cancer polygenic risk scores (PRSs). We aimed to develop PRSs using the largest available studies of Asian ancestry and to assess the transferability of PRS across ethnic subgroups. METHODS: The development data set comprised 138,309 women from 17 case-control studies. PRSs were generated using a clumping and thresholding method, lasso penalized regression, an Empirical Bayes approach, a Bayesian polygenic prediction approach, or linear combinations of multiple PRSs. These PRSs were evaluated in 89,898 women from 3 prospective studies (1592 incident cases). RESULTS: The best performing PRS (genome-wide set of single-nucleotide variations [formerly single-nucleotide polymorphism]) had a hazard ratio per unit SD of 1.62 (95% CI = 1.46-1.80) and an area under the receiver operating curve of 0.635 (95% CI = 0.622-0.649). Combined Asian and European PRSs (333 single-nucleotide variations) had a hazard ratio per SD of 1.53 (95% CI = 1.37-1.71) and an area under the receiver operating curve of 0.621 (95% CI = 0.608-0.635). The distribution of the latter PRS was different across ethnic subgroups, confirming the importance of population-specific calibration for valid estimation of breast cancer risk. CONCLUSION: PRSs developed in this study, from association data from multiple ancestries, can enhance risk stratification for women of Asian ancestry

    Searching for the Genetic Basis of Hygienic Behavior and Overwintering in the Honeybee (Apis mellifera)

    Get PDF
    The recent decline in honeybee populations can be mitigated through genomics and marker-assisted selection. The current techniques, such as chemical treatment to prevent disease, are only short-term solutions. The ability to breed honeybees that are disease and winter resistant would be ideal. Current breeding techniques lack knowledge of predictive markers that may improve these traits. Here we perform a genome-wide association study on 925 colonies by measuring hygienic and overwintering behavior of the colonies, followed by sequencing their genomes. L1 regression is a technique developed to pick the best Single Nucleotide Polymorphisms that explain the variance in the phenotype. Using L1 regression, we found 27 Single Nucleotide Polymorphisms for hygiene and 32 Single Nucleotide Polymorphisms for overwintering behaviour that could be used to breed for healthier and winter hardy honeybees

    SNPsyn: detection and exploration of SNP–SNP interactions

    Get PDF
    SNPsyn (http://snpsyn.biolab.si) is an interactive software tool for the discovery of synergistic pairs of single nucleotide polymorphisms (SNPs) from large genome-wide case-control association studies (GWAS) data on complex diseases. Synergy among SNPs is estimated using an information-theoretic approach called interaction analysis. SNPsyn is both a stand-alone C++/Flash application and a web server. The computationally intensive part is implemented in C++ and can run in parallel on a dedicated cluster or grid. The graphical user interface is written in Adobe Flash Builder 4 and can run in most web browsers or as a stand-alone application. The SNPsyn web server hosts the Flash application, receives GWAS data submissions, invokes the interaction analysis and serves result files. The user can explore details on identified synergistic pairs of SNPs, perform gene set enrichment analysis and interact with the constructed SNP synergy network
    corecore