89 research outputs found

    A novel random effect model for GWAS meta‐analysis and its application to trans‐ethnic meta‐analysis

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134177/1/biom12481-sup-0001-SuppData.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134177/2/biom12481_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134177/3/biom12481.pd

    Principal component analysis in high dimensional data: application for genomewide association studies

    Get PDF
    In genomewide association studies (GWAS), population stratification (PS) is a major confounding factor which causes spurious associations by inflating test statistics. PS refers to differences in allele frequencies by disease status due to systematic differences in ancestry, rather than causal association of genes with disease. PCA is commonly used to infer population structure by computing PC scores, which are subsequently used for control of population stratification. Even though PCA is now widely used for PS adjustment, there are still challenges for PCA based effective PS control. One common feature of the genomic data is the strong local correlation among adjacent loci/markers caused by linkage disequilibrium (LD). It is known that this local correlation can have a negative effect on estimated PC scores and produce spurious PCs which do not truly reflect underlying population structure. To address this problem, we have employed a shrinkage PCA approach where coefficients are used to down-weight the contribution of highly correlated SNPs in PCA. Another challenge in PC analysis is choosing which PCs to include as covariates to adjust population stratification. While searching for a reasonable measure for PC selection, we have found the precise relationship between genotype principal components and inflation of association test statistics. Based on this fact, We propose a new approach, called EigenCorr, which selects principal components based on both their eigenvalues and their correlation with the (disease) phenotype. Our approach tends to select fewer principal components for stratification control than does testing of eigenvalues alone, providing substantial computational savings and improvements in power. Under many circumstances, it is of interest to predict PC scores. Although PC score prediction is commonly used in practice, characteristics of the predicted PC scores have not been systematically studied. Under high dimensional settings we have found that the naive predicted PC scores are systematically biased toward 0, and this phenomenon is largely due to the inconsistency of the sample eigenvalues and eigenvectors. We have extended existing convergence results of sample eigenvalues and eigenvectors and derived asymptotic shrinkage factors. Based on these asymptotic results, we propose the bias-adjusted PC score prediction

    Parameter Estimation of Type-I and Type-II Hybrid Censored Data from the Log-Logistic Distribution

    Get PDF
    In experiments on product lifetime and reliability testing, there are many practical situations in which researchers terminate the experiment and report the results before all items of the experiment fail because of time or cost consideration. The most common and popular censoring schemes are type-I and type-II censoring. In type-I censoring scheme, the termination time is pre-fixed, but the number of observed failures is a random variable. However, if the mean lifetime of experimental units is somewhat larger than the pre-fixed termination time, then far fewer failures would be observed and this is a significant disadvantage on the efficiency of inferential procedures. In type-II censoring scheme, however, the number of observed failures is pre-fixed, but the experiment time is a random variable. In this case, at least pre-specified number of failure are obtained, but the termination time is clearly a disadvantage from the experimenter’s point of view. To overcome some of the drawbacks in those schemes, the hybrid censoring scheme, which is a mixture of the conventional type-I and type-II censoring schemes, has received much attention in recent years. In this paper, we consider the analysis of type-I and type-II hybrid censored data where the lifetimes of items follow two-parameter log-logistic distribution. We present the maximum likelihood estimators of unknown parameters and asymptotic confidence intervals, and a simulation study is conducted to evaluate the proposed methods

    Improving power for rare‐variant tests by integrating external controls

    Full text link
    Due to the drop in sequencing cost, the number of sequenced genomes is increasing rapidly. To improve power of rare‐variant tests, these sequenced samples could be used as external control samples in addition to control samples from the study itself. However, when using external controls, possible batch effects due to the use of different sequencing platforms or genotype calling pipelines can dramatically increase type I error rates. To address this, we propose novel summary statistics based single and gene‐ or region‐based rare‐variant tests that allow the integration of external controls while controlling for type I error. Our approach is based on the insight that batch effects on a given variant can be assessed by comparing odds ratio estimates using internal controls only vs. using combined control samples of internal and external controls. From simulation experiments and the analysis of data from age‐related macular degeneration and type 2 diabetes studies, we demonstrate that our method can substantially improve power while controlling for type I error rate.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/138932/1/gepi22057.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/138932/2/gepi22057_am.pd

    Multi‐SKAT: General framework to test for rare‐variant association with multiple phenotypes

    Full text link
    In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to identify sets of trait‐associated variants within genes or regions of interest. Existing multiphenotype tests for rare variants make specific assumptions about the patterns of association with underlying causal variants, and the violation of these assumptions can reduce power to detect association. Here, we develop a general framework for testing pleiotropic effects of rare variants on multiple continuous phenotypes using multivariate kernel regression (Multi‐SKAT). Multi‐SKAT models affect sizes of variants on the phenotypes through a kernel matrix and perform a variance component test of association. We show that many existing tests are equivalent to specific choices of kernel matrices with the Multi‐SKAT framework. To increase power of detecting association across tests with different kernel matrices, we developed a fast and accurate approximation of the significance of the minimum observed P value across tests. To account for related individuals, our framework uses random effects for the kinship matrix. Using simulated data and amino acid and exome‐array data from the METabolic Syndrome In Men (METSIM) study, we show that Multi‐SKAT can improve power over single‐phenotype SKAT‐O test and existing multiple‐phenotype tests, while maintaining Type I error rate.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147759/1/gepi22156.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147759/2/gepi22156_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147759/3/gepi22156-sup-0001-Supplementary_GenEpi_Revision_Final.pd

    Convergence of sample eigenvalues, eigenvectors, and principal component scores for ultra-high dimensional data

    Get PDF
    The development of high-throughput biomedical technologies has led to increased interest in the analysis of high-dimensional data where the number of features is much larger than the sample size. In this paper, we investigate principal component analysis under the ultra-high dimensional regime, where both the number of features and the sample size increase as the ratio of the two quantities also increases. We bridge the existing results from the finite and the high-dimension low sample size regimes, embedding the two regimes in a more general framework. We also numerically demonstrate the universal application of the results from the finite regime

    Optimal tests for rare variant effects in sequencing association studies

    Get PDF
    With development of massively parallel sequencing technologies, there is a substantial need for developing powerful rare variant association tests. Common approaches include burden and non-burden tests. Burden tests assume all rare variants in the target region have effects on the phenotype in the same direction and of similar magnitude. The recently proposed sequence kernel association test (SKAT) (Wu, M. C., and others, 2011. Rare-variant association testing for sequencing data with the SKAT. The American Journal of Human Genetics 89, 82–93], an extension of the C-alpha test (Neale, B. M., and others, 2011. Testing for an unusual distribution of rare variants. PLoS Genetics 7, 161–165], provides a robust test that is particularly powerful in the presence of protective and deleterious variants and null variants, but is less powerful than burden tests when a large number of variants in a region are causal and in the same direction. As the underlying biological mechanisms are unknown in practice and vary from one gene to another across the genome, it is of substantial practical interest to develop a test that is optimal for both scenarios. In this paper, we propose a class of tests that include burden tests and SKAT as special cases, and derive an optimal test within this class that maximizes power. We show that this optimal test outperforms burden tests and SKAT in a wide range of scenarios. The results are illustrated using simulation studies and triglyceride data from the Dallas Heart Study. In addition, we have derived sample size/power calculation formula for SKAT with a new family of kernels to facilitate designing new sequence association studies

    Nonparametric Bayesian Variable Selection With Applications to Multiple Quantitative Trait Loci Mapping With Epistasis and Gene–Environment Interaction

    Get PDF
    The joint action of multiple genes is an important source of variation for complex traits and human diseases. However, mapping genes with epistatic effects and gene–environment interactions is a difficult problem because of relatively small sample sizes and very large parameter spaces for quantitative trait locus models that include such interactions. Here we present a nonparametric Bayesian method to map multiple quantitative trait loci (QTL) by considering epistatic and gene–environment interactions. The proposed method is not restricted to pairwise interactions among genes, as is typically done in parametric QTL analysis. Rather than modeling each main and interaction term explicitly, our nonparametric Bayesian method measures the importance of each QTL, irrespective of whether it is mostly due to a main effect or due to some interaction effect(s), via an unspecified function of the genotypes at all candidate QTL. A Gaussian process prior is assigned to this unknown function. In addition to the candidate QTL, nongenetic factors and covariates, such as age, gender, and environmental conditions, can also be included in the unspecified function. The importance of each genetic factor (QTL) and each nongenetic factor/covariate included in the function is estimated by a single hyperparameter, which enters the covariance function and captures any main or interaction effect associated with a given factor/covariate. An initial evaluation of the performance of the proposed method is obtained via analysis of simulated and real data
    corecore