2,058 research outputs found

    A Bayesian approach to efficient differential allocation for resampling-based significance testing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.</p> <p>Results</p> <p>We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson's disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.</p> <p>Conclusion</p> <p>Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at <url>http://people.pcbi.upenn.edu/~lswang/pub/bmc2009/</url>.</p

    Portfolio choice and estimation risk : a comparison of Bayesian approaches to resampled efficiency

    Get PDF
    Estimation risk is known to have a huge impact on mean/variance (MV) optimized portfolios, which is one of the primary reasons to make standard Markowitz optimization unfeasible in practice. Several approaches to incorporate estimation risk into portfolio selection are suggested in the earlier literature. These papers regularly discuss heuristic approaches (e.g., placing restrictions on portfolio weights) and Bayesian estimators. Among the Bayesian class of estimators, we will focus in this paper on the Bayes/Stein estimator developed by Jorion (1985, 1986), which is probably the most popular estimator. We will show that optimal portfolios based on the Bayes/Stein estimator correspond to portfolios on the original mean-variance efficient frontier with a higher risk aversion. We quantify this increase in risk aversion. Furthermore, we review a relatively new approach introduced by Michaud (1998), resampling efficiency. Michaud argues that the limitations of MV efficiency in practice generally derive from a lack of statistical understanding of MV optimization. He advocates a statistical view of MV optimization that leads to new procedures that can reduce estimation risk. Resampling efficiency has been contrasted to standard Markowitz portfolios until now, but not to other approaches which explicitly incorporate estimation risk. This paper attempts to fill this gap. Optimal portfolios based on the Bayes/Stein estimator and resampling efficiency are compared in an empirical out-of-sample study in terms of their Sharpe ratio and in terms of stochastic dominance

    FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution

    Get PDF
    Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10-9) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. © The Author(s) 2010. Published by Oxford University Press.published_or_final_versio

    Essays in quantitative finance

    Get PDF

    Adapting the Number of Particles in Sequential Monte Carlo Methods through an Online Scheme for Convergence Assessment

    Full text link
    Particle filters are broadly used to approximate posterior distributions of hidden states in state-space models by means of sets of weighted particles. While the convergence of the filter is guaranteed when the number of particles tends to infinity, the quality of the approximation is usually unknown but strongly dependent on the number of particles. In this paper, we propose a novel method for assessing the convergence of particle filters online manner, as well as a simple scheme for the online adaptation of the number of particles based on the convergence assessment. The method is based on a sequential comparison between the actual observations and their predictive probability distributions approximated by the filter. We provide a rigorous theoretical analysis of the proposed methodology and, as an example of its practical use, we present simulations of a simple algorithm for the dynamic and online adaption of the number of particles during the operation of a particle filter on a stochastic version of the Lorenz system

    Power-enhanced multiple decision functions controlling family-wise error and false discovery rates

    Get PDF
    Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the \v{S}id\'ak procedure for FWER control and the Benjamini--Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual pp-values, as is the case, for example, with the \v{S}id\'ak, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing pp-value based procedures whose theoretical validity is contingent on each of these pp-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional "large MM, small nn" data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.Comment: Published in at http://dx.doi.org/10.1214/10-AOS844 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL) optimization framework

    Get PDF
    Simplicity and flexibility of meta-heuristic optimization algorithms have attracted lots of attention in the field of optimization. Different optimization methods, however, hold algorithm-specific strengths and limitations, and selecting the best-performing algorithm for a specific problem is a tedious task. We introduce a new hybrid optimization framework, entitled Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), which combines the strengths of different evolutionary algorithms (EAs) in a parallel computing scheme. SC-SAHEL explores performance of different EAs, such as the capability to escape local attractions, speed, convergence, etc., during population evolution as each individual EA suits differently to various response surfaces. The SC-SAHEL algorithm is benchmarked over 29 conceptual test functions, and a real-world hydropower reservoir model case study. Results show that the hybrid SC-SAHEL algorithm is rigorous and effective in finding global optimum for a majority of test cases, and that it is computationally efficient in comparison to algorithms with individual EA

    Expression QTLs Mapping and Analysis: A Bayesian Perspective.

    Get PDF
    The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results
    corecore