2,058 research outputs found
A Bayesian approach to efficient differential allocation for resampling-based significance testing
<p>Abstract</p> <p>Background</p> <p>Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.</p> <p>Results</p> <p>We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson's disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.</p> <p>Conclusion</p> <p>Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at <url>http://people.pcbi.upenn.edu/~lswang/pub/bmc2009/</url>.</p
Portfolio choice and estimation risk : a comparison of Bayesian approaches to resampled efficiency
Estimation risk is known to have a huge impact on mean/variance (MV) optimized portfolios, which is one of the primary reasons to make standard Markowitz optimization unfeasible in practice. Several approaches to incorporate estimation risk into portfolio selection are suggested in the earlier literature. These papers regularly discuss heuristic approaches (e.g., placing restrictions on portfolio weights) and Bayesian estimators. Among the Bayesian class of estimators, we will focus in this paper on the Bayes/Stein estimator developed by Jorion (1985, 1986), which is probably the most popular estimator. We will show that optimal portfolios based on the Bayes/Stein estimator correspond to portfolios on the original mean-variance efficient frontier with a higher risk aversion. We quantify this increase in risk aversion. Furthermore, we review a relatively new approach introduced by Michaud (1998), resampling efficiency. Michaud argues that the limitations of MV efficiency in practice generally derive from a lack of statistical understanding of MV optimization. He advocates a statistical view of MV optimization that leads to new procedures that can reduce estimation risk. Resampling efficiency has been contrasted to standard Markowitz portfolios until now, but not to other approaches which explicitly incorporate estimation risk. This paper attempts to fill this gap. Optimal portfolios based on the Bayes/Stein estimator and resampling efficiency are compared in an empirical out-of-sample study in terms of their Sharpe ratio and in terms of stochastic dominance
FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution
Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10-9) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. © The Author(s) 2010. Published by Oxford University Press.published_or_final_versio
Adapting the Number of Particles in Sequential Monte Carlo Methods through an Online Scheme for Convergence Assessment
Particle filters are broadly used to approximate posterior distributions of
hidden states in state-space models by means of sets of weighted particles.
While the convergence of the filter is guaranteed when the number of particles
tends to infinity, the quality of the approximation is usually unknown but
strongly dependent on the number of particles. In this paper, we propose a
novel method for assessing the convergence of particle filters online manner,
as well as a simple scheme for the online adaptation of the number of particles
based on the convergence assessment. The method is based on a sequential
comparison between the actual observations and their predictive probability
distributions approximated by the filter. We provide a rigorous theoretical
analysis of the proposed methodology and, as an example of its practical use,
we present simulations of a simple algorithm for the dynamic and online
adaption of the number of particles during the operation of a particle filter
on a stochastic version of the Lorenz system
Power-enhanced multiple decision functions controlling family-wise error and false discovery rates
Improved procedures, in terms of smaller missed discovery rates (MDR), for
performing multiple hypotheses testing with weak and strong control of the
family-wise error rate (FWER) or the false discovery rate (FDR) are developed
and studied. The improvement over existing procedures such as the \v{S}id\'ak
procedure for FWER control and the Benjamini--Hochberg (BH) procedure for FDR
control is achieved by exploiting possible differences in the powers of the
individual tests. Results signal the need to take into account the powers of
the individual tests and to have multiple hypotheses decision functions which
are not limited to simply using the individual -values, as is the case, for
example, with the \v{S}id\'ak, Bonferroni, or BH procedures. They also enhance
understanding of the role of the powers of individual tests, or more precisely
the receiver operating characteristic (ROC) functions of decision processes, in
the search for better multiple hypotheses testing procedures. A
decision-theoretic framework is utilized, and through auxiliary randomizers the
procedures could be used with discrete or mixed-type data or with rank-based
nonparametric tests. This is in contrast to existing -value based procedures
whose theoretical validity is contingent on each of these -value statistics
being stochastically equal to or greater than a standard uniform variable under
the null hypothesis. Proposed procedures are relevant in the analysis of
high-dimensional "large , small " data sets arising in the natural,
physical, medical, economic and social sciences, whose generation and creation
is accelerated by advances in high-throughput technology, notably, but not
limited to, microarray technology.Comment: Published in at http://dx.doi.org/10.1214/10-AOS844 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL) optimization framework
Simplicity and flexibility of meta-heuristic optimization algorithms have attracted lots of attention in the field of optimization. Different optimization methods, however, hold algorithm-specific strengths and limitations, and selecting the best-performing algorithm for a specific problem is a tedious task. We introduce a new hybrid optimization framework, entitled Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), which combines the strengths of different evolutionary algorithms (EAs) in a parallel computing scheme. SC-SAHEL explores performance of different EAs, such as the capability to escape local attractions, speed, convergence, etc., during population evolution as each individual EA suits differently to various response surfaces. The SC-SAHEL algorithm is benchmarked over 29 conceptual test functions, and a real-world hydropower reservoir model case study. Results show that the hybrid SC-SAHEL algorithm is rigorous and effective in finding global optimum for a majority of test cases, and that it is computationally efficient in comparison to algorithms with individual EA
Expression QTLs Mapping and Analysis: A Bayesian Perspective.
The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results
- …