28 research outputs found

    The pebbling threshold of the square of cliques

    Get PDF
    AbstractGiven an initial configuration of pebbles on a graph, one can move pebbles in pairs along edges, at the cost of one of the pebbles moved, with the objective of reaching a specified target vertex. The pebbling number of a graph is the minimum number of pebbles so that every configuration of that many pebbles can reach any chosen target. The pebbling threshold of a sequence of graphs is roughly the number of pebbles so that almost every (resp. almost no) configuration of asymptotically more (resp. fewer) pebbles can reach any chosen target. In this paper we find the pebbling threshold of the sequence of squares of cliques, improving upon an earlier result of Boyle and verifying an important instance of a probabilistic version of Graham's product conjecture

    Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16

    Get PDF
    Recently, gene set analysis (GSA) has been extended from use on gene expression data to use on single-nucleotide polymorphism (SNP) data in genome-wide association studies. When GSA has been demonstrated on SNP data, two popular statistics from gene expression data analysis (gene set enrichment analysis [GSEA] and Fisher's exact test [FET]) have been used. However, GSEA and FET have shown a lack of power and robustness in the analysis of gene expression data. The purpose of this work is to investigate whether the same issues are also true for the analysis of SNP data. Ultimately, we conclude that GSEA and FET are not optimal for the analysis of SNP data when compared with the SUMSTAT method. In analysis of real SNP data from the Framingham Heart Study, we find that SUMSTAT finds many more gene sets to be significant when compared with other methods. In an analysis of simulated data, SUMSTAT demonstrates high power and better control of the type I error rate. GSA is a promising approach to the analysis of SNP data in GWAS and use of the SUMSTAT statistic instead of GSEA or FET may increase power and robustness

    Cost-Effectiveness of Reclassification Sampling for Prevalence Estimation

    Get PDF
    Background: Typically, a two-phase (double) sampling strategy is employed when classifications are subject to error and there is a gold standard (perfect) classifier available. Two-phase sampling involves classifying the entire sample with an imperfect classifier, and a subset of the sample with the gold-standard. Methodology/Principal Findings: In this paper we consider an alternative strategy termed reclassification sampling, which involves classifying individuals using the imperfect classifier more than one time. Estimates of sensitivity, specificity and prevalence are provided for reclassification sampling, when either one or two binary classifications of each individual using the imperfect classifier are available. Robustness of estimates and design decisions to model assumptions are considered. Software is provided to compute estimates and provide advice on the optimal sampling strategy. Conclusions/Significance: Reclassification sampling is shown to be cost-effective (lower standard error of estimates for the same cost) for estimating prevalence as compared to two-phase sampling in many practical situations

    Evaluating methods for combining rare variant data in pathway-based tests of genetic association

    Get PDF
    Abstract Analyzing sets of genes in genome-wide association studies is a relatively new approach that aims to capitalize on biological knowledge about the interactions of genes in biological pathways. This approach, called pathway analysis or gene set analysis, has not yet been applied to the analysis of rare variants. Applying pathway analysis to rare variants offers two competing approaches. In the first approach rare variant statistics are used to generate p-values for each gene (e.g., combined multivariate collapsing [CMC] or weighted-sum [WS]) and the gene-level p-values are combined using standard pathway analysis methods (e.g., gene set enrichment analysis or Fisher’s combined probability method). In the second approach, rare variant methods (e.g., CMC and WS) are applied directly to sets of single-nucleotide polymorphisms (SNPs) representing all SNPs within genes in a pathway. In this paper we use simulated phenotype and real next-generation sequencing data from Genetic Analysis Workshop 17 to analyze sets of rare variants using these two competing approaches. The initial results suggest substantial differences in the methods, with Fisher’s combined probability method and the direct application of the WS method yielding the best power. Evidence suggests that the WS method works well in most situations, although Fisher’s method was more likely to be optimal when the number of causal SNPs in the set was low but the risk of the causal SNPs was high

    Evaluating methods for the analysis of rare variants in sequence data

    Get PDF
    A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data

    Evaluating methods for combining rare variant data in pathway-based tests of genetic association

    Get PDF
    Analyzing sets of genes in genome-wide association studies is a relatively new approach that aims to capitalize on biological knowledge about the interactions of genes in biological pathways. This approach, called pathway analysis or gene set analysis, has not yet been applied to the analysis of rare variants. Applying pathway analysis to rare variants offers two competing approaches. In the first approach rare variant statistics are used to generate p-values for each gene (e.g., combined multivariate collapsing [CMC] or weighted-sum [WS]) and the gene-level p-values are combined using standard pathway analysis methods (e.g., gene set enrichment analysis or Fisher’s combined probability method). In the second approach, rare variant methods (e.g., CMC and WS) are applied directly to sets of single-nucleotide polymorphisms (SNPs) representing all SNPs within genes in a pathway. In this paper we use simulated phenotype and real next-generation sequencing data from Genetic Analysis Workshop 17 to analyze sets of rare variants using these two competing approaches. The initial results suggest substantial differences in the methods, with Fisher’s combined probability method and the direct application of the WS method yielding the best power. Evidence suggests that the WS method works well in most situations, although Fisher’s method was more likely to be optimal when the number of causal SNPs in the set was low but the risk of the causal SNPs was high

    The Cost-Effectiveness of Reclassification Sampling for Prevalence Estimation

    Get PDF
    Typically, a two-phase (double) sampling strategy is employed when classifications are subject to error and there is a gold standard (perfect) classifier available. Two-phase sampling involves classifying the entire sample with an imperfect classifier, and a subset of the sample with the gold-standard.In this paper we consider an alternative strategy termed reclassification sampling, which involves classifying individuals using the imperfect classifier more than one time. Estimates of sensitivity, specificity and prevalence are provided for reclassification sampling, when either one or two binary classifications of each individual using the imperfect classifier are available. Robustness of estimates and design decisions to model assumptions are considered. Software is provided to compute estimates and provide advice on the optimal sampling strategy.Reclassification sampling is shown to be cost-effective (lower standard error of estimates for the same cost) for estimating prevalence as compared to two-phase sampling in many practical situations

    Two-pebbling and Odd-two-pebbling are Not Equivalent

    Get PDF
    Let G be a connected graph. A configuration of pebbles assigns a nonnegative integer number of pebbles to each vertex of G. A move consists of removing two pebbles from one vertex and placing one pebble on an adjacent vertex. A configuration is solvable if any vertex can get at least one pebble through a sequence of moves. The pebbling number of G, denoted π(G), is the smallest integer such that any configuration of π(G) pebbles on G is solvable. A graph has the two-pebbling property if after placing more than 2π(G) -- q pebbles on G, where q is the number of vertices with pebbles, there is a sequence of moves so that at least two pebbles can be placed on any vertex. A graph has the odd-two-pebbling property if after placing more than 2π(G) -- r pebbles on G, where r is the number of vertices with an odd number of pebbles, there is a sequence of moves so that at least two pebbles can be placed on any vertex. In this paper, we prove that the two-pebbling and odd-two-pebbling properties are not equivalent
    corecore