Search CORE

Cost-Effectiveness of Reclassification Sampling for Prevalence Estimation

Author: Bekmetjev Airat
DeWinkle Benjamin
Lunderberg Eric
McLellan Brian
Tintle Nathan L.
VanBruggen Dirk
Publication venue: Digital Collections @ Dordt
Publication date: 13/02/2012
Field of study

Background: Typically, a two-phase (double) sampling strategy is employed when classifications are subject to error and there is a gold standard (perfect) classifier available. Two-phase sampling involves classifying the entire sample with an imperfect classifier, and a subset of the sample with the gold-standard. Methodology/Principal Findings: In this paper we consider an alternative strategy termed reclassification sampling, which involves classifying individuals using the imperfect classifier more than one time. Estimates of sensitivity, specificity and prevalence are provided for reclassification sampling, when either one or two binary classifications of each individual using the imperfect classifier are available. Robustness of estimates and design decisions to model assumptions are considered. Software is provided to compute estimates and provide advice on the optimal sampling strategy. Conclusions/Significance: Reclassification sampling is shown to be cost-effective (lower standard error of estimates for the same cost) for estimating prevalence as compared to two-phase sampling in many practical situations

Carolina Digital Repository

Evaluating methods for combining rare variant data in pathway-based tests of genetic association

Author: Bekmetjev Airat
Luedtke Alexander
Petersen Ashley
Powers Scott
Sitarik Alexandra
Tintle Nathan L
Publication venue: BioMed Central Ltd
Publication date: 29/11/2011
Field of study

Abstract Analyzing sets of genes in genome-wide association studies is a relatively new approach that aims to capitalize on biological knowledge about the interactions of genes in biological pathways. This approach, called pathway analysis or gene set analysis, has not yet been applied to the analysis of rare variants. Applying pathway analysis to rare variants offers two competing approaches. In the first approach rare variant statistics are used to generate p-values for each gene (e.g., combined multivariate collapsing [CMC] or weighted-sum [WS]) and the gene-level p-values are combined using standard pathway analysis methods (e.g., gene set enrichment analysis or Fisher’s combined probability method). In the second approach, rare variant methods (e.g., CMC and WS) are applied directly to sets of single-nucleotide polymorphisms (SNPs) representing all SNPs within genes in a pathway. In this paper we use simulated phenotype and real next-generation sequencing data from Genetic Analysis Workshop 17 to analyze sets of rare variants using these two competing approaches. The initial results suggest substantial differences in the methods, with Fisher’s combined probability method and the direct application of the WS method yielding the best power. Evidence suggests that the WS method works well in most situations, although Fisher’s method was more likely to be optimal when the number of causal SNPs in the set was low but the risk of the causal SNPs was high

Evaluating methods for the analysis of rare variants in sequence data

Author: A Morris
Airat Bekmetjev
Alexander Luedtke
Alexandra Sitarik
Ashley Petersen
B Li
BE Madsen
C Dering
LA Almasy
M Zawistowski
Nathan L Tintle
RC Lewontin
Scott Powers
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data

Springer - Publisher Connector

Crossref

Directory of Open Access Journals

Carolina Digital Repository

Evaluating methods for combining rare variant data in pathway-based tests of genetic association

Author: A Luedtke
A Morris
A Subramanian
Airat Bekmetjev
Alexander Luedtke
Alexandra Sitarik
Ashley Petersen
B Efron
B Li
BE Madsen
D Chasman
K Wang
M Wu
M Zawistowski
N Tintle
Nathan L Tintle
NL Tintle
Scott Powers
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Analyzing sets of genes in genome-wide association studies is a relatively new approach that aims to capitalize on biological knowledge about the interactions of genes in biological pathways. This approach, called pathway analysis or gene set analysis, has not yet been applied to the analysis of rare variants. Applying pathway analysis to rare variants offers two competing approaches. In the first approach rare variant statistics are used to generate p-values for each gene (e.g., combined multivariate collapsing [CMC] or weighted-sum [WS]) and the gene-level p-values are combined using standard pathway analysis methods (e.g., gene set enrichment analysis or Fisher’s combined probability method). In the second approach, rare variant methods (e.g., CMC and WS) are applied directly to sets of single-nucleotide polymorphisms (SNPs) representing all SNPs within genes in a pathway. In this paper we use simulated phenotype and real next-generation sequencing data from Genetic Analysis Workshop 17 to analyze sets of rare variants using these two competing approaches. The initial results suggest substantial differences in the methods, with Fisher’s combined probability method and the direct application of the WS method yielding the best power. Evidence suggests that the WS method works well in most situations, although Fisher’s method was more likely to be optimal when the number of causal SNPs in the set was low but the risk of the causal SNPs was high

Springer - Publisher Connector

Crossref

Carolina Digital Repository

The Cost-Effectiveness of Reclassification Sampling for Prevalence Estimation

Author: A Tenenbein
Airat Bekmetjev
B Borchers
Benjamin DeWinkle
Brian McLellan
Dirk VanBruggen
Eric Lunderberg
G Casella
G Koch
H Fujisawa
JAR Nofuentes
JP Sutcliffe
JP Sutcliffe
LM Wruck
Nathan Tintle
NL Tintle
NL Tintle
NL Tintle
R
R McNamee
R McNamee
W Schill
WG Cochran
Zheng Su
Publication venue: Public Library of Science
Publication date: 13/02/2012
Field of study

Typically, a two-phase (double) sampling strategy is employed when classifications are subject to error and there is a gold standard (perfect) classifier available. Two-phase sampling involves classifying the entire sample with an imperfect classifier, and a subset of the sample with the gold-standard.In this paper we consider an alternative strategy termed reclassification sampling, which involves classifying individuals using the imperfect classifier more than one time. Estimates of sensitivity, specificity and prevalence are provided for reclassification sampling, when either one or two binary classifications of each individual using the imperfect classifier are available. Robustness of estimates and design decisions to model assumptions are considered. Software is provided to compute estimates and provide advice on the optimal sampling strategy.Reclassification sampling is shown to be cost-effective (lower standard error of estimates for the same cost) for estimating prevalence as compared to two-phase sampling in many practical situations

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

FigShare

Two-pebbling and Odd-two-pebbling are Not Equivalent

Author: Bekmetjev Airat
Cusack Charles A.
Powers Mark
Publication venue: Hope College Digital Commons
Publication date: 01/03/2019
Field of study

Let G be a connected graph. A configuration of pebbles assigns a nonnegative integer number of pebbles to each vertex of G. A move consists of removing two pebbles from one vertex and placing one pebble on an adjacent vertex. A configuration is solvable if any vertex can get at least one pebble through a sequence of moves. The pebbling number of G, denoted π(G), is the smallest integer such that any configuration of π(G) pebbles on G is solvable. A graph has the two-pebbling property if after placing more than 2π(G) -- q pebbles on G, where q is the number of vertices with pebbles, there is a sequence of moves so that at least two pebbles can be placed on any vertex. A graph has the odd-two-pebbling property if after placing more than 2π(G) -- r pebbles on G, where r is the number of vertices with an odd number of pebbles, there is a sequence of moves so that at least two pebbles can be placed on any vertex. In this paper, we prove that the two-pebbling and odd-two-pebbling properties are not equivalent