Search CORE

197 research outputs found

Multiple Testing for Exploratory Research

Author: Goeman Jelle J.
Solari Aldo
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

Motivated by the practice of exploratory research, we formulate an approach to multiple testing that reverses the conventional roles of the user and the multiple testing procedure. Traditionally, the user chooses the error criterion, and the procedure the resulting rejected set. Instead, we propose to let the user choose the rejected set freely, and to let the multiple testing procedure return a confidence statement on the number of false rejections incurred. In our approach, such confidence statements are simultaneous for all choices of the rejected set, so that post hoc selection of the rejected set does not compromise their validity. The proposed reversal of roles requires nothing more than a review of the familiar closed testing procedure, but with a focus on the non-consonant rejections that this procedure makes. We suggest several shortcuts to avoid the computational problems associated with closed testing.Comment: Published in at http://dx.doi.org/10.1214/11-STS356 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Leiden University Scholary Publications

Analyzing gene expression data in terms of gene sets: methodological issues

Author: Bühlmann Peter
Goeman Jelle J.
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing. Results: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing. Contact: [email protected]

RERO DOC Digital Library

Rejoinder to "Multiple Testing for Exploratory Research"

Author: Goeman Jelle J.
Solari Aldo
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 30/11/2011
Field of study

Rejoinder to "Multiple Testing for Exploratory Research" by J. J. Goeman, A. Solari [arXiv:1208.2841].Comment: Published in at http://dx.doi.org/10.1214/11-STS356REJ the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Flexible control of the median of the false discovery proportion

Author: Goeman Jelle J
Hemerik Jesse
Solari Aldo
Publication venue
Publication date: 13/03/2024
Field of study

We introduce a multiple testing procedure that controls the median of the proportion of false discoveries (FDP) in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini-Hochberg method, which controls the mean of the FDP. Our method allows freely choosing one or several values of alpha after seeing the data -- unlike Benjamini-Hochberg, which can be very liberal when alpha is chosen post hoc. We prove these claims and illustrate them with simulations. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the FDP, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values

arXiv.org e-Print Archive

A Cochran-Armitage-type and a score-free global test for multivariate ordinal data

Author: Goeman Jelle J.
Jelizarow Monika
Mansmann Ulrich
Publication venue
Publication date: 14/08/2014
Field of study

We propose a Cochran-Armitage-type and a score-free global test that can be used to assess the presence of an association between a set of ordinally scaled covariates and an outcome variable within the range of generalized linear models. Both tests are developed within the framework of the well-established 'global test' methodology and as such are feasible in high-dimensional data situations under any correlation and enable adjustment for covariates. The Cochran-Armitage-type test, for which an intimate connection with the traditional score-based Cochran-Armitage test is shown, rests upon explicit assumptions on the distances between the covariates' ordered categories. In contrast, the score-free test parametrizes these distances and thus keeps them flexible, rendering it ideally suited for covariates measured on an ordinal scale. As confirmed by means of simulations, the Cochran-Armitage-type test focuses its power on set-outcome relationships where the distances between the covariates' categories are equal or close to those assumed, whereas the score-free test spreads its power over the full range of possible set-outcome relationships, putting more emphasis on monotonic than on non-monotonic ones. Based on the tests' power properties, it is discussed when to favour one or the other, and the practical merits of both of them are illustrated by an application in the field of rehabilitation medicine. Our proposed tests are implemented in the R package globaltest