2,400 research outputs found

    Step-Down Multiple Comparison Procedures Using Medians and Permutation Tests

    Get PDF
    Richter and McCann (2007) presented a median-based multiple comparison procedure for assessing evidence of group location differences. The sampling distribution was based on the permutation distribution of the maximum median difference among all pairs, and provides strong control of the FWE. This idea is extended to develop a step-down procedure for comparing group locations. The new step-down procedure exploits logical dependencies between pairwise hypotheses and provides greater power than the single-step procedure, while still maintaining strong FWE control. The new procedure can also be a more powerful alternative to existing methods based on means, especially for heavy-tailed distributions

    Simultaneous multiple comparisons with a control using median differences and permutation tests

    Get PDF
    Permutation methods using median differences for simultaneous pairwise comparisons with a control are investigated. Simulation results suggest that the permutation methods are generally more powerful than the Dunnett procedure when data are from nonnormal distributions. A new procedure is shown to provide strong control of the familywise error rate, and have highest power for detecting the treatment that differs most from the control, for certain nonnormal distributions. Step-down permutation procedures, which have greater power to detect treatment differences with the control, are also proposed and examined. The procedures are illustrated using an example from the applied literature

    Multiple testing with persistent homology

    Full text link
    Multiple hypothesis testing requires a control procedure. Simply increasing simulations or permutations to meet a Bonferroni-style threshold is prohibitively expensive. In this paper we propose a null model based approach to testing for acyclicity, coupled with a Family-Wise Error Rate (FWER) control method that does not suffer from these computational costs. We adapt an False Discovery Rate (FDR) control approach to the topological setting, and show it to be compatible both with our null model approach and with previous approaches to hypothesis testing in persistent homology. By extending a limit theorem for persistent homology on samples from point processes, we provide theoretical validation for our FWER and FDR control methods

    Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In many research areas it is necessary to find differences between treatment groups with several variables. For example, studies of microarray data seek to find a significant difference in location parameters from zero or one for ratios thereof for each variable. However, in some studies a significant deviation of the difference in locations from zero (or 1 in terms of the ratio) is biologically meaningless. A relevant difference or ratio is sought in such cases.</p> <p>Results</p> <p>This article addresses the use of relevance-shifted tests on ratios for a multivariate parallel two-sample group design. Two empirical procedures are proposed which embed the relevance-shifted test on ratios. As both procedures test a hypothesis for each variable, the resulting multiple testing problem has to be considered. Hence, the procedures include a multiplicity correction. Both procedures are extensions of available procedures for point null hypotheses achieving exact control of the familywise error rate. Whereas the shift of the null hypothesis alone would give straight-forward solutions, the problems that are the reason for the empirical considerations discussed here arise by the fact that the shift is considered in both directions and the whole parameter space in between these two limits has to be accepted as null hypothesis.</p> <p>Conclusion</p> <p>The first algorithm to be discussed uses a permutation algorithm, and is appropriate for designs with a moderately large number of observations. However, many experiments have limited sample sizes. Then the second procedure might be more appropriate, where multiplicity is corrected according to a concept of data-driven order of hypotheses.</p

    Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies

    Get PDF
    BACKGROUND: There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources. RESULTS: Running more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power. CONCLUSIONS: Our results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40168-016-0208-8) contains supplementary material, which is available to authorized users

    A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection

    Get PDF
    Background: Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. Methodology: Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student's t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student's t-test for association, as well as a novel MB-MDR implementation based on Welch's t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. Results: Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch's t-tests are generally lower than those for MB-MDR with Student's t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. Conclusions: When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student's t-tests as internal tests for association
    • …
    corecore