258 research outputs found
Exact testing with random permutations
When permutation methods are used in practice, often a limited number of
random permutations are used to decrease the computational burden. However,
most theoretical literature assumes that the whole permutation group is used,
and methods based on random permutations tend to be seen as approximate. There
exists a very limited amount of literature on exact testing with random
permutations and only recently a thorough proof of exactness was given. In this
paper we provide an alternative proof, viewing the test as a "conditional Monte
Carlo test" as it has been called in the literature. We also provide extensions
of the result. Importantly, our results can be used to prove properties of
various multiple testing procedures based on random permutations
Multiple Testing for Exploratory Research
Motivated by the practice of exploratory research, we formulate an approach
to multiple testing that reverses the conventional roles of the user and the
multiple testing procedure. Traditionally, the user chooses the error
criterion, and the procedure the resulting rejected set. Instead, we propose to
let the user choose the rejected set freely, and to let the multiple testing
procedure return a confidence statement on the number of false rejections
incurred. In our approach, such confidence statements are simultaneous for all
choices of the rejected set, so that post hoc selection of the rejected set
does not compromise their validity. The proposed reversal of roles requires
nothing more than a review of the familiar closed testing procedure, but with a
focus on the non-consonant rejections that this procedure makes. We suggest
several shortcuts to avoid the computational problems associated with closed
testing.Comment: Published in at http://dx.doi.org/10.1214/11-STS356 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Analysing multiple types of molecular profiles simultaneously: connecting the needles in the haystack
It has been shown that a random-effects framework can be used to test the
association between a gene's expression level and the number of DNA copies of a
set of genes. This gene-set modelling framework was later applied to find
associations between mRNA expression and microRNA expression, by defining the
gene sets using target prediction information.
Here, we extend the model introduced by Menezes et al (2009) to consider the
effect of not just copy number, but also of other molecular profiles such as
methylation changes and loss-of-heterozigosity (LOH), on gene expression
levels. We will consider again sets of measurements, to improve robustness of
results and increase the power to find associations. Our approach can be used
genome-wide to find associations, yields a test to help separate true
associations from noise and can include confounders.
We apply our method to colon and to breast cancer samples, for which
genome-wide copy number, methylation and gene expression profiles are
available. Our findings include interesting gene expression-regulating
mechanisms, which may involve only one of copy number or methylation, or both
for the same samples. We even are able to find effects due to different
molecular mechanisms in different samples.
Our method can equally well be applied to cases where other types of
molecular (high-dimensional) data are collected, such as LOH, SNP genotype and
microRNA expression data. Computationally efficient, it represents a flexible
and powerful tool to study associations between high-dimensional datasets. The
method is freely available via the SIM BioConductor package
Analyzing gene expression data in terms of gene sets: methodological issues
Motivation: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing. Results: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing. Contact: [email protected]
Rejoinder to "Multiple Testing for Exploratory Research"
Rejoinder to "Multiple Testing for Exploratory Research" by J. J. Goeman, A.
Solari [arXiv:1208.2841].Comment: Published in at http://dx.doi.org/10.1214/11-STS356REJ the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …