20 research outputs found

    The Cost-Effectiveness of Reclassification Sampling for Prevalence Estimation

    Get PDF
    Typically, a two-phase (double) sampling strategy is employed when classifications are subject to error and there is a gold standard (perfect) classifier available. Two-phase sampling involves classifying the entire sample with an imperfect classifier, and a subset of the sample with the gold-standard.In this paper we consider an alternative strategy termed reclassification sampling, which involves classifying individuals using the imperfect classifier more than one time. Estimates of sensitivity, specificity and prevalence are provided for reclassification sampling, when either one or two binary classifications of each individual using the imperfect classifier are available. Robustness of estimates and design decisions to model assumptions are considered. Software is provided to compute estimates and provide advice on the optimal sampling strategy.Reclassification sampling is shown to be cost-effective (lower standard error of estimates for the same cost) for estimating prevalence as compared to two-phase sampling in many practical situations

    SEAS: A System for SEED-Based Pathway Enrichment Analysis

    Get PDF
    Pathway enrichment analysis represents a key technique for analyzing high-throughput omic data, and it can help to link individual genes or proteins found to be differentially expressed under specific conditions to well-understood biological pathways. We present here a computational tool, SEAS, for pathway enrichment analysis over a given set of genes in a specified organism against the pathways (or subsystems) in the SEED database, a popular pathway database for bacteria. SEAS maps a given set of genes of a bacterium to pathway genes covered by SEED through gene ID and/or orthology mapping, and then calculates the statistical significance of the enrichment of each relevant SEED pathway by the mapped genes. Our evaluation of SEAS indicates that the program provides highly reliable pathway mapping results and identifies more organism-specific pathways than similar existing programs. SEAS is publicly released under the GPL license agreement and freely available at http://csbl.bmb.uga.edu/~xizeng/research/seas/

    Comparison of Pathway Analysis Approaches Using Lung Cancer GWAS Data Sets

    Get PDF
    Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach

    Implications of Training in Incremental Theories of Intelligence for Undergraduate Statistics Students

    Get PDF
    This chapter documents the effects of training in incremental theories of intelligence on students in introductory statistics courses at a liberal arts university in the US. Incremental theories of intelligence examine the beliefs individuals hold of knowledge and how it is attained. An individual with an incremental theory of intelligence believes that intelligence can be developed. The research examined differences by gender in mastery of statistics and attitudes toward statistics for students who received growth mind-set training. A pre-test, post-test design utilised the Students’ Attitudes Toward Statistics© instrument and the Comprehensive Assessment of Outcomes in a first Statistics course. An ANCOVA revealed that females gained more than males on their value of statistics (F(1, 63) 9.40, MSE 3.79, p .003, η2 P 0.134) and decreased less for effort expended to learn statistics (F(1, 63) 4.41, MSE 4.07, p .040, η2 P 0.067). Females also gained mastery of statistical concepts at a greater rate (F(1, 63) 5.30, MSE 0.06, p .025, η2 P 0.080) indicating a possible path to alleviate the under-representation of females in STEM

    Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed.</p> <p>Results</p> <p>We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix® gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size.</p> <p>Conclusions</p> <p>Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.</p
    corecore