25,829 research outputs found
A Statistical Method for the Conservative Adjustment of False Discovery Rate (q-value).
BACKGROUND: q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation.
RESULTS: We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method.
CONCLUSIONS: The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak
A statistical method for the conservative adjustment of false discovery rate (q-value)
Background
q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation. Results
We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method. Conclusions
The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak
Multiple testing procedures under confounding
While multiple testing procedures have been the focus of much statistical
research, an important facet of the problem is how to deal with possible
confounding. Procedures have been developed by authors in genetics and
statistics. In this chapter, we relate these proposals. We propose two new
multiple testing approaches within this framework. The first combines
sensitivity analysis methods with false discovery rate estimation procedures.
The second involves construction of shrinkage estimators that utilize the
mixture model for multiple testing. The procedures are illustrated with
applications to a gene expression profiling experiment in prostate cancer.Comment: Published in at http://dx.doi.org/10.1214/193940307000000176 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
An adaptive significance threshold criterion for massive multiple hypotheses testing
This research deals with massive multiple hypothesis testing. First regarding
multiple tests as an estimation problem under a proper population model, an
error measurement called Erroneous Rejection Ratio (ERR) is introduced and
related to the False Discovery Rate (FDR). ERR is an error measurement similar
in spirit to FDR, and it greatly simplifies the analytical study of error
properties of multiple test procedures. Next an improved estimator of the
proportion of true null hypotheses and a data adaptive significance threshold
criterion are developed. Some asymptotic error properties of the significant
threshold criterion is established in terms of ERR under distributional
assumptions widely satisfied in recent applications. A simulation study
provides clear evidence that the proposed estimator of the proportion of true
null hypotheses outperforms the existing estimators of this important parameter
in massive multiple tests. Both analytical and simulation studies indicate that
the proposed significance threshold criterion can provide a reasonable balance
between the amounts of false positive and false negative errors, thereby
complementing and extending the various FDR control procedures. S-plus/R code
is available from the author upon request.Comment: Published at http://dx.doi.org/10.1214/074921706000000392 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multiple testing for SNP-SNP interactions
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction patterns given as expressions linked by logical operators. Methods for multiple testing in high-dimensional settings can be applied when many SNPs are considered simultaneously. However, another less well-known multiple testing problem arises within a fixed subset of SNPs when the logic expression is chosen optimally. In this article, we propose a general asymptotic approach for deriving the distribution of the maximally selected chi-square statistic in various situations. We show how this result can be used for testing logic expressions - in particular SNP-SNP interaction patterns - while controlling for multiple comparisons. Simulations show that our method provides multiple testing adjustment when the logic expression is chosen such as to maximize the statistic. Its benefit is demonstrated through an application to a real
dataset from a large population-based study considering allergy and asthma in KORA. An implementation of our method is available from the Comprehensive R Archive Network (CRAN) as R package 'SNPmaxsel'
- …