592 research outputs found

    Powerful rare variant association testing in a copula-based joint analysis of multiple phenotypes

    Get PDF
    In genetic association studies of rare variants, the low power of association tests is one of the main challenges. In this study, we propose a new single‐marker association test called C‐JAMP (Copula-based Joint Analysis of Multiple Phenotypes), which is based on a joint model of multiple phenotypes given genetic markers and other covariates. We evaluated its performance and compared its empirical type I error and power with existing univariate and multivariate single-marker and multi-marker rare-variant tests in extensive simulation studies. C-JAMP yielded unbiased genetic effect estimates and valid type I errors with an adjusted test statistic. When strongly dependent traits were jointly analyzed, C-JAMP had the highest power in all scenarios except when a high percentage of variants were causal with moderate/small effect sizes. When traits with weak or moderate dependence were analyzed, whether C-JAMP or competing approaches had higher power depended on the effect size. When C‐JAMP was applied with a misspecified copula function, it still achieved high power in some of the scenarios considered. In a real-data application, we analyzed sequencing data using C‐JAMP and performed the first genome-wide association studies of high-molecular-weight and medium-molecular-weight adiponectin plasma concentrations. C-JAMP identified 20 rare variants with p-values smaller than 10(−5), while all other tests resulted in the identification of fewer variants with higher p-values. In summary, the results indicate that C-JAMP is a powerful, flexible, and robust method for association studies, and we identified novel candidate markers for adiponectin. C‐JAMP is implemented as an R package and freely available from https://cran.r-project.org/package=CJAMP

    Statistical Methods for Aggregation of Sequence Data and Multiple Testing Correction in Common and Rare Variant Analysis

    Full text link
    Over the last fifteen years, there have been substantial improvements in how we study the association between trait and genetic variations in the human genome. Genome-wide association studies (GWAS) now routinely test millions of variants in hundreds of thousands of individuals and the advance of genome sequencing technology allows us to examine the role of genetic variants across the full allele-frequency spectrum. However, with these changes come new challenges in analyzing and interpreting genetic results. In this dissertation, we present methods to aggregate sequence data and identify significant associations in common and rare variant analysis. In chapter two, we compare two strategies to aggregate sequence data from multiple studies: joint variant calling of all samples together versus calling each study individually and then combining the results using meta-analysis. Although joint calling is the gold standard, single-study calling can be more appealing due to fewer privacy restrictions and smaller computational burden. We use deep- and low-coverage sequence data on 2,250 samples from the GoT2D study to compare the two strategies in terms of variant detection sensitivity, genotype accuracy, and association power. We show single-study calling to be a viable alternative to joint calling for deep-coverage sequence data but show them to have noticeable discrepancies in rare variant calling and association results for low-coverage sequence data. In chapter three, we revisit the common variant P-value significance threshold of 5e-8 and explore the rates of true and false discoveries that can be expected using less restrictive P-value thresholds and three other multiple testing procedures: Benjamini-Hochberg (BH) and Benjamini-Yekutieli (BY) for controlling false discovery rate (FDR), and Bayesian false discovery probability for controlling Bayesian FDR. Using data from the Global Lipids and GIANT consortia, we show for large sample common variant GWAS that using a less stringent P-value threshold of 5e-7 or use of the BH procedure at target FDR threshold of 5% substantially increases the number of true positive discoveries while only modestly increasing false positive discoveries compared with the 5e-8 threshold. The latter threshold remains appropriate for modest-sized studies or for resource-intensive follow-ups such as constructing animal models where a stringently curated list of significant loci is desired from GWAS. In the chapter four, we propose a Bayesian method for multiple testing correction in rare variant studies that calculates the posterior probabilities using an approximation of the Bayes factor and estimates prior parameters from summary statistics using an Expectation-Maximization algorithm. Using simulations analyses of ~400,000 individuals and ~107 million variants from the TOPMed-imputed UK Biobank study, we show that our Bayesian method discovers more true positive loci than P-value-based methods such as the P-value threshold, BH, and BY procedures at equivalent false positive rates. In addition, we show that the Bayesian method controls empirical FDR among discovered loci. Finally, we estimate the genome-wide significant P-value threshold for testing ~107 million variants from the TOPMed imputation reference panel to be 1e-9.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162936/1/zhongshc_1.pd

    Joint analysis of multiple phenotypes: summary of results and discussions from the Genetic Analysis Workshop 19

    Get PDF
    For Genetic Analysis Workshop 19, 2 extensive data sets were provided, including whole genome and whole exome sequence data, gene expression data, and longitudinal blood pressure outcomes, together with nongenetic covariates. These data sets gave researchers the chance to investigate different aspects of more complex relationships within the data, and the contributions in our working group focused on statistical methods for the joint analysis of multiple phenotypes, which is part of the research field of data integration. The analysis of data from different sources poses challenges to researchers but provides the opportunity to model the real-life situation more realistically.Our 4 contributions all used the provided real data to identify genetic predictors for blood pressure. In the contributions, novel multivariate rare variant tests, copula models, structural equation models and a sparse matrix representation variable selection approach were applied. Each of these statistical models can be used to investigate specific hypothesized relationships, which are described together with their biological assumptions.The results showed that all methods are ready for application on a genome-wide scale and can be used or extended to include multiple omics data sets. The results provide potentially interesting genetic targets for future investigation and replication. Furthermore, all contributions demonstrated that the analysis of complex data sets could benefit from modeling correlated phenotypes jointly as well as by adding further bioinformatics information

    Genetic association analysis based on a joint model of gene expression and blood pressure

    Get PDF
    Recent work on genetic association studies suggests that much of the heritable variation in complex traits is unexplained, which indicates a need for using more biologically meaningful modeling approaches and appropriate statistical methods. In this study, we propose a biological framework and a corresponding statistical model incorporating multilevel biological measures, and illustrate it in the analysis of the real data provided by the Genetic Analysis Workshop (GAW) 19, which contains whole genome sequence (WGS), gene expression (GE), and blood pressure (BP) data. We investigate the direct effect of single-nucleotide variants (SNVs) on BP and GE, while considering the non-directional dependence between BP and GE, by using copula functions to jointly model BP and GE conditional on SNVs. We implement the method for analysis on a genome-wide scale, and illustrate it within an association analysis of 68,727 SNVs on chromosome 19 that lie in or around genes with available GE measures. Although there is no indication for inflated type I errors under the proposed method, our results show that the association tests have smaller p values than tests under univariate models for common and rare variants using single-variant tests and gene-based multimarker tests. Hence, considering multilevel biological measures and modeling the dependence structure between these measures by using a plausible graphical approach may lead to more informative findings than standard univariate tests of common variants and well-recognized gene-based rare variant tests

    General Approach for Combining Diverse Rare Variant Association Tests Provides Improved Robustness Across a Wider Range of Genetic Architectures

    Get PDF
    The widespread availability of genome sequencing data made possible by way of next-generation technologies has yielded a flood of different gene-based rare variant association tests. Most of these tests have been published because they have superior power for particular genetic architectures. However, for applied researchers it is challenging to know which test to choose in practice when little is known a priori about genetic architecture. Recently, tests have been proposed which combine two particular individual tests (one burden and one variance components) to minimize power loss while improving robustness to a wider range of genetic architectures. In our analysis we propose an expansion of these approaches, yielding a general method that works for combining any number of individual tests. We demonstrate that running multiple different tests on the same dataset and using a Bonferroni correction for multiple testing is never better than combining tests using our general method. We also find that using a test statistic that is highly robust to the inclusion of non-causal variants (Joint-infinity) together with a previously published combined test (SKAT-O) provides improved robustness to a wide range of genetic architectures and should be considered for use in practice. Software for this approach is supplied. We support the increased use of combined tests in practice-- as well as further exploration of novel combined testing approaches using the general framework provided here--to maximize robustness of rare-variant testing strategies against a wide range of genetic architectures

    Filtering genetic variants and placing informative priors based on putative biological function

    Get PDF
    High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure

    Modeling of Multivariate Longitudinal Phenotypes in Family Genetic Studies with Bayesian Multiplicity Adjustment

    Get PDF
    Genetic studies often collect data on multiple traits. Most genetic association analyses, however, consider traits separately and ignore potential correlation among traits, partially because of difficulties in statistical modeling of multivariate outcomes. When multiple traits are measured in a pedigree longitudinally, additional challenges arise because in addition to correlation between traits, a trait is often correlated with its own measures over time and with measurements of other family members. We developed a Bayesian model for analysis of bivariate quantitative traits measured longitudinally in family genetic studies. For a given trait, family-specific and subject-specific random effects account for correlation among family members and repeated measures, respectively. Correlation between traits is introduced by incorporating multivariate random effects and allowing time-specific trait residuals to correlate as in seemingly unrelated regressions. The proposed model can examine multiple single-nucleotide variations simultaneously, as well as incorporate familyspecific, subject-specific, or time-varying covariates. Bayesian multiplicity technique is used to effectively control false positives. Genetic Analysis Workshop 18 simulated data illustrate the proposed approach\u27s applicability in modeling longitudinal multivariate outcomes in family genetic association studies
    corecore