308 research outputs found

    False discovery rate: setting the probability of false claim of detection

    Full text link
    When testing multiple hypothesis in a survey --e.g. many different source locations, template waveforms, and so on-- the final result consists in a set of confidence intervals, each one at a desired confidence level. But the probability that at least one of these intervals does not cover the true value increases with the number of trials. With a sufficiently large array of confidence intervals, one can be sure that at least one is missing the true value. In particular, the probability of false claim of detection becomes not negligible. In order to compensate for this, one should increase the confidence level, at the price of a reduced detection power. False discovery rate control is a relatively new statistical procedure that bounds the number of mistakes made when performing multiple hypothesis tests. We shall review this method, discussing exercise applications to the field of gravitational wave surveys.Comment: 7 pages, 3 table, 3 figures. Prepared for the Proceedings of GWDAW 9 (http://lappc-in39.in2p3.fr/GWDAW9) A new section was added with a numerical example, along with two tables and a figure related to the new section. Many smaller revisions to improve readibilit

    Pruning of genetic programming trees using permutation tests

    Get PDF
    We present a novel approach based on statistical permutation tests for pruning redundant subtrees from genetic programming (GP) trees that allows us to explore the extent of effective redundancy . We observe that over a range of regression problems, median tree sizes are reduced by around 20% largely independent of test function, and that while some large subtrees are removed, the median pruned subtree comprises just three nodes; most take the form of an exact algebraic simplification. Our statistically-based pruning technique has allowed us to explore the hypothesis that a given subtree can be replaced with a constant if this substitution results in no statistical change to the behavior of the parent tree – what we term approximate simplification. In the eventuality, we infer that more than 95% of the accepted pruning proposals are the result of algebraic simplifications, which provides some practical insight into the scope of removing redundancies in GP trees

    Evidence for gene–gene epistatic interactions among susceptibility loci for systemic lupus erythematosus

    Full text link
    Objective Several confirmed genetic susceptibility loci for lupus have been described. To date, no clear evidence for genetic epistasis in lupus has been established. The aim of this study was to test for gene–gene interactions in a number of known lupus susceptibility loci. Methods Eighteen single‐nucleotide polymorphisms tagging independent and confirmed lupus susceptibility loci were genotyped in a set of 4,248 patients with lupus and 3,818 normal healthy control subjects of European descent. Epistasis was tested by a 2‐step approach using both parametric and nonparametric methods. The false discovery rate (FDR) method was used to correct for multiple testing. Results We detected and confirmed gene–gene interactions between the HLA region and CTLA4 , IRF5 , and ITGAM and between PDCD1 and IL21 in patients with lupus. The most significant interaction detected by parametric analysis was between rs3131379 in the HLA region and rs231775 in CTLA4 (interaction odds ratio 1.19, Z = 3.95, P = 7.8 × 10 −5 [FDR ≤0.05], P for multifactor dimensionality reduction = 5.9 × 10 −45 ). Importantly, our data suggest that in patients with lupus, the presence of the HLA lupus risk alleles in rs1270942 and rs3131379 increases the odds of also carrying the lupus risk allele in IRF5 (rs2070197) by 17% and 16%, respectively ( P = 0.0028 and P = 0.0047, respectively). Conclusion We provide evidence for gene–gene epistasis in systemic lupus erythematosus. These findings support a role for genetic interaction contributing to the complexity of lupus heritability.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90353/1/33354_ftp.pd

    Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

    Get PDF
    Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives

    A constrained polynomial regression procedure for estimating the local False Discovery Rate

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the context of genomic association studies, for which a large number of statistical tests are performed simultaneously, the local False Discovery Rate (<it>lFDR</it>), which quantifies the evidence of a specific gene association with a clinical or biological variable of interest, is a relevant criterion for taking into account the multiple testing problem. The <it>lFDR </it>not only allows an inference to be made for each gene through its specific value, but also an estimate of Benjamini-Hochberg's False Discovery Rate (<it>FDR</it>) for subsets of genes.</p> <p>Results</p> <p>In the framework of estimating procedures without any distributional assumption under the alternative hypothesis, a new and efficient procedure for estimating the <it>lFDR </it>is described. The results of a simulation study indicated good performances for the proposed estimator in comparison to four published ones. The five different procedures were applied to real datasets.</p> <p>Conclusion</p> <p>A novel and efficient procedure for estimating <it>lFDR </it>was developed and evaluated.</p

    Computation of significance scores of unweighted Gene Set Enrichment Analyses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.</p> <p>Results</p> <p>We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.</p

    Parallel multiplicity and error discovery rate (EDR) in microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In microarray gene expression profiling experiments, differentially expressed genes (DEGs) are detected from among tens of thousands of genes on an array using statistical tests. It is important to control the number of false positives or errors that are present in the resultant DEG list. To date, more than 20 different multiple test methods have been reported that compute overall Type I error rates in microarray experiments. However, these methods share the following dilemma: they have low power in cases where only a small number of DEGs exist among a large number of total genes on the array.</p> <p>Results</p> <p>This study contrasts parallel multiplicity of objectively related tests against the traditional simultaneousness of subjectively related tests and proposes a new assessment called the Error Discovery Rate (EDR) for evaluating multiple test comparisons in microarray experiments. Parallel multiple tests use only the negative genes that parallel the positive genes to control the error rate; while simultaneous multiple tests use the total unchanged gene number for error estimates. Here, we demonstrate that the EDR method exhibits improved performance over other methods in specificity and sensitivity in testing expression data sets with sequence digital expression confirmation, in examining simulation data, as well as for three experimental data sets that vary in the proportion of DEGs. The EDR method overcomes a common problem of previous multiple test procedures, namely that the Type I error rate detection power is low when the total gene number used is large but the DEG number is small.</p> <p>Conclusions</p> <p>Microarrays are extensively used to address many research questions. However, there is potential to improve the sensitivity and specificity of microarray data analysis by developing improved multiple test comparisons. This study proposes a new view of multiplicity in microarray experiments and the EDR provides an alternative multiple test method for Type I error control in microarray experiments.</p
    corecore