4,006 research outputs found

    Computable exponential bounds for screened estimation and simulation

    Full text link
    Suppose the expectation E(F(X))E(F(X)) is to be estimated by the empirical averages of the values of FF on independent and identically distributed samples {Xi}\{X_i\}. A sampling rule called the "screened" estimator is introduced, and its performance is studied. When the mean E(U(X))E(U(X)) of a different function UU is known, the estimates are "screened," in that we only consider those which correspond to times when the empirical average of the {U(Xi)}\{U(X_i)\} is sufficiently close to its known mean. As long as UU dominates FF appropriately, the screened estimates admit exponential error bounds, even when F(X)F(X) is heavy-tailed. The main results are several nonasymptotic, explicit exponential bounds for the screened estimates. A geometric interpretation, in the spirit of Sanov's theorem, is given for the fact that the screened estimates always admit exponential error bounds, even if the standard estimates do not. And when they do, the screened estimates' error probability has a significantly better exponent. This implies that screening can be interpreted as a variance reduction technique. Our main mathematical tools come from large deviations techniques. The results are illustrated by a detailed simulation example.Comment: Published in at http://dx.doi.org/10.1214/00-AAP492 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Efficient posterior sampling for high-dimensional imbalanced logistic regression

    Full text link
    High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as nn and/or pp increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.Comment: 4 figure

    A Framework for Robust Assessment of Power Grid Stability and Resiliency

    Full text link
    Security assessment of large-scale, strongly nonlinear power grids containing thousands to millions of interacting components is a computationally expensive task. Targeting at reducing the computational cost, this paper introduces a framework for constructing a robust assessment toolbox that can provide mathematically rigorous certificates for the grids' stability in the presence of variations in power injections, and for the grids' ability to withstand a bunch sources of faults. By this toolbox we can "off-line" screen a wide range of contingencies or power injection profiles, without reassessing the system stability on a regular basis. In particular, we formulate and solve two novel robust stability and resiliency assessment problems of power grids subject to the uncertainty in equilibrium points and uncertainty in fault-on dynamics. Furthermore, we bring in the quadratic Lyapunov functions approach to transient stability assessment, offering real-time construction of stability/resiliency certificates and real-time stability assessment. The effectiveness of the proposed techniques is numerically illustrated on a number of IEEE test cases

    Speeding Up MCMC by Delayed Acceptance and Data Subsampling

    Full text link
    The complexity of the Metropolis-Hastings (MH) algorithm arises from the requirement of a likelihood evaluation for the full data set in each iteration. Payne and Mallick (2015) propose to speed up the algorithm by a delayed acceptance approach where the acceptance decision proceeds in two stages. In the first stage, an estimate of the likelihood based on a random subsample determines if it is likely that the draw will be accepted and, if so, the second stage uses the full data likelihood to decide upon final acceptance. Evaluating the full data likelihood is thus avoided for draws that are unlikely to be accepted. We propose a more precise likelihood estimator which incorporates auxiliary information about the full data likelihood while only operating on a sparse set of the data. We prove that the resulting delayed acceptance MH is more efficient compared to that of Payne and Mallick (2015). The caveat of this approach is that the full data set needs to be evaluated in the second stage. We therefore propose to substitute this evaluation by an estimate and construct a state-dependent approximation thereof to use in the first stage. This results in an algorithm that (i) can use a smaller subsample m by leveraging on recent advances in Pseudo-Marginal MH (PMMH) and (ii) is provably within O(m−2)O(m^{-2}) of the true posterior.Comment: Accepted for publication in Journal of Computational and Graphical Statistic

    Experimental Design for Sensitivity Analysis, Optimization and Validation of Simulation Models

    Get PDF
    This chapter gives a survey on the use of statistical designs for what-if analysis in simula- tion, including sensitivity analysis, optimization, and validation/verification. Sensitivity analysis is divided into two phases. The first phase is a pilot stage, which consists of screening or searching for the important factors among (say) hundreds of potentially important factors. A novel screening technique is presented, namely sequential bifurcation. The second phase uses regression analysis to approximate the input/output transformation that is implied by the simulation model; the resulting regression model is also known as a metamodel or a response surface. Regression analysis gives better results when the simu- lation experiment is well designed, using either classical statistical designs (such as frac- tional factorials) or optimal designs (such as pioneered by Fedorov, Kiefer, and Wolfo- witz). To optimize the simulated system, the analysts may apply Response Surface Metho- dology (RSM); RSM combines regression analysis, statistical designs, and steepest-ascent hill-climbing. To validate a simulation model, again regression analysis and statistical designs may be applied. Several numerical examples and case-studies illustrate how statisti- cal techniques can reduce the ad hoc character of simulation; that is, these statistical techniques can make simulation studies give more general results, in less time. Appendix 1 summarizes confidence intervals for expected values, proportions, and quantiles, in termi- nating and steady-state simulations. Appendix 2 gives details on four variance reduction techniques, namely common pseudorandom numbers, antithetic numbers, control variates or regression sampling, and importance sampling. Appendix 3 describes jackknifing, which may give robust confidence intervals.least squares;distribution-free;non-parametric;stopping rule;run-length;Von Neumann;median;seed;likelihood ratio

    Supercomputers, Monte Carlo simulation and regression analysis

    Get PDF
    Monte Carlo Technique;Supercomputer;computer science
    • …
    corecore