830 research outputs found

    Enrichment analysis of genetic association in genes and pathways by aggregating signals from both rare and common variants

    Get PDF
    New high-throughput sequencing technologies have brought forth opportunities for unbiased analysis of thousands of rare genomic variants in genome-wide association studies of complex diseases. Because it is hard to detect single rare variants with appreciable effect sizes at the population level, existing methods mostly aggregate effects of multiple markers by collapsing the rare variants in genes (or genomic regions). We hypothesize that a higher level of aggregation can further improve association signal strength. Using the Genetic Analysis Workshop 17 simulated data, we test a two-step strategy that first applies a collapsing method in a gene-level analysis and then aggregates the gene-level test results by performing an enrichment analysis in gene sets. We find that the gene set approach which combines signals across multiple genes outperforms testing individual genes separately and that the power of the gene set enrichment test is further improved by proper adjustment of statistics to account for gene-wise differences

    Methods for detecting associations between phenotype and aggregations of rare variants

    Get PDF
    Although genome-wide association studies have uncovered variants associated with more than 150 traits, the percentage of phenotypic variation explained by these associations remains small. This has led to the search for the dark matter that explains this missing genetic component of heritability. One potential explanation for dark matter is rare variants, and several statistics have been devised to detect associations resulting from aggregations of rare variants in relatively short regions of interest, such as candidate genes. In this paper we investigate the feasibility of extending this approach in an agnostic way, in which we consider all variants within a much broader region of interest, such as an entire chromosome or even the entire exome. Our method searches for subsets of variant sites using either Markov chain Monte Carlo or genetic algorithms. The analysis was performed with knowledge of the Genetic Analysis Workshop 17 answers

    A novel method to detect rare variants using both family and unrelated case-control data

    Get PDF
    To detect rare variants associated with a phenotype, we develop a novel statistical method that can use both family and unrelated case-control data. Unlike the currently existing methods, we first use family data to calculate weights to be given to rare variants, differentiating between concordantly affected and discordant sib pairs. These weights are then used in an association test applied to the unrelated case-control data. We applied the proposed method to the simulated sequencing data in Genetic Analysis Workshop 17 and identified two genes associated with the disease

    Enhancing the discovery of rare disease variants through hierarchical modeling

    Get PDF
    Advances in next-generation sequencing technology are enabling researchers to capture a comprehensive picture of genomic variation across large numbers of individuals with unprecedented levels of efficiency. The main analytic challenge in disease mapping is how to mine the data for rare causal variants among a sea of neutral variation. To achieve this goal, investigators have proposed a number of methods that exploit biological knowledge. In this paper, I propose applying a Bayesian stochastic search variable selection algorithm in this context. My multivariate method is inspired by the combined multivariate and collapsing method. In this proposed method, however, I allow an arbitrary number of different sources of biological knowledge to inform the model as prior distributions in a two-level hierarchical model. This allows rare variants with similar prior distributions to share evidence of association. Using the 1000 Genomes Project single-nucleotide polymorphism data provided by Genetic Analysis Workshop 17, I show that through biologically informative prior distributions, some power can be gained over noninformative prior distributions

    Two-stage analyses of sequence variants in association with quantitative traits

    Get PDF
    We propose a two-stage design for the analysis of sequence variants in which a proportion of genes that show some evidence of association are identified initially and then followed up in an independent data set. We compare two different approaches. In both approaches the same summary measure (total number of minor alleles) is used for each gene in the initial analysis. In the first (simple) approach the same summary measure is used in the analysis of the independent data set. In the second (alternative) approach a more specific hypothesis is formed for the second stage; the summary measure used is the count of minor alleles in only those variants that in the initial data showed the same direction of association as was seen overall. We applied the methods to the simulated quantitative traits of Genetic Analysis Workshop 17, blind to the simulation model, and then evaluated their performance once the underlying model was known. Performance was similar for most genes, but the simple strategy considerably out-performed the alternative strategy for one gene, where most of the effect was due to very rare variants; this suggests that the alternative approach would not be advisable when the effect is seen in very rare variants. Further simulations are needed to investigate the potential superior power of the alternative method when some variants within a gene have opposing effects. Overall, the power to detect associations was low; this was also true when using a more powerful joint analysis that combined the two stages of the study

    Detecting rare functional variants using a wavelet-based test on quantitative and qualitative traits

    Get PDF
    We conducted a genome-wide association study on the Genetic Analysis Workshop 17 simulated unrelated individuals data using a multilocus score test based on wavelet transformation that we proposed recently. Wavelet transformation is an advanced smoothing technique, whereas the currently popular collapsing methods are the simplest way to smooth multilocus genotypes. The wavelet-based test suppresses noise from the data more effectively, which results in lower type I error rates. We chose a level-dependent threshold for the wavelet-based test to suppress the optimal amount of noise according to the data. We propose several remedies to reduce the inflated type I error rate: using a window of fixed size rather than a gene; using the Bonferroni correction rather than comparing to the maxima of test values for multiple testing corrections; and removing the influence of other factors by using residuals for the association test. A wavelet-based test can detect multiple rare functional variants. Type I error rates can be controlled using the wavelet-based test combined with the mentioned remedies

    Genome-wide case-control study in GAW17 using coalesced rare variants

    Get PDF
    Genome-wide association studies have successfully identified numerous loci at which common variants influence disease risks or quantitative traits of interest. Despite these successes, the variants identified by these studies have generally explained only a small fraction of the variations in the phenotype. One explanation may be that many rare variants that are not included in the common genotyping platforms may contribute substantially to the genetic variations of the diseases. Next-generation sequencing, which would better allow for the analysis of rare variants, is now becoming available and affordable; however, the presence of a large number of rare variants challenges the statistical endeavor to stably identify these disease-causing genetic variants. We conduct a genome-wide association study of Genetic Analysis Workshop 17 case-control data produced by the next-generation sequencing technique and propose that collapsing rare variants within each genetic region through a supervised dimension reduction algorithm leads to several macrovariants constructed for rare variants within each genetic region. A simultaneous association of the phenotype to all common variants and macrovariants is undertaken using a linear discriminant analysis using the penalized orthogonal-components regression algorithm. The results suggest that the proposed analysis strategy shows promise but needs further development

    Evaluation of pooled association tests for rare variant identification

    Get PDF
    Genome-wide association studies have successfully identified many common variants associated with complex human diseases. However, a large portion of the remaining heritability cannot be explained by these common variants. Exploring rare variants associated with diseases is now catching more attention. Several methods have been recently proposed for identification of rare variants. Among them, the fixed-threshold, weighted-sum, and variable-threshold methods are effective in combining the information of multiple variants into a functional unit; these approaches are commonly used. We evaluate the performance of these three methods. Based on our analyses of the Genetic Analysis Workshop 17 data, we find that no method is universally better than the others. Furthermore, adjusting for potential covariates can not only increase the true-positive proportions but also reduce the false-positive proportions. Our study concludes that there is no uniformly most powerful test among the three methods we compared (the fixed-threshold, weighted-sum, and variable-threshold methods), and their performances depend on the underlying genetic architecture of a disease

    Improved power by collapsing rare and common variants based on a data-adaptive forward selection strategy

    Get PDF
    Genome-wide association studies have been used successfully to detect associations between common genetic variants and complex diseases, but common single-nucleotide polymorphisms (SNPs) detected by these studies explain only 5–10% of disease heritability. Alternatively, the common disease/rare variants hypothesis suggests that complex diseases are often caused by multiple rare variants with moderate to high effects. Under this hypothesis, the analysis of the cumulative effect of rare variants may thus help us discover the missing genetic variations. Collapsing all rare variants across a functional region is currently a popular method to find rare variants that may have a causal effect on certain diseases. However, the power of tests based on collapsing methods is often impaired by misclassification of functional variants. We develop a data-adaptive forward selection procedure that selectively chooses only variants that improve the association signal between functional regions and the disease risk. We apply our strategy to the Genetic Analysis Workshop 17 unrelated individuals data with quantitative traits. The type I error rate and the power of different collapsing functions are evaluated. The substantially higher power of the proposed strategy was demonstrated. The new method provides a useful strategy for the association study of sequencing data by taking advantage of the selection of rare variants
    corecore