23,513 research outputs found


    Full text link
    Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this article, a case is made for the importance of stability in statistics. Firstly, we motivate the necessity of stability for interpretable and reliable encoding models from brain fMRI signals. Secondly, we find strong evidence in the literature to demonstrate the central role of stability in statistical inference, such as sensitivity analysis and effect detection. Thirdly, a smoothing parameter selector based on estimation stability (ES), ES-CV, is proposed for Lasso, in order to bring stability to bear on cross-validation (CV). ES-CV is then utilized in the encoding models to reduce the number of predictors by 60% with almost no loss (1.3%) of prediction performance across over 2,000 voxels. Last, a novel "stability" argument is seen to drive new results that shed light on the intriguing interactions between sample to sample variability and heavier tail error distribution (e.g., double-exponential) in high-dimensional regression models with pp predictors and nn independent samples. In particular, when p/nκ(0.3,1)p/n\rightarrow\kappa\in(0.3,1) and the error distribution is double-exponential, the Ordinary Least Squares (OLS) is a better estimator than the Least Absolute Deviation (LAD) estimator.Comment: Published in at http://dx.doi.org/10.3150/13-BEJSP14 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Comment: Monitoring Networked Applications With Incremental Quantile Estimation

    Full text link
    Comment: Monitoring Networked Applications With Incremental Quantile Estimation [arXiv:0708.0302]Comment: Published at http://dx.doi.org/10.1214/088342306000000628 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The shuffle estimator for explainable variance in fMRI experiments

    Full text link
    In computational neuroscience, it is important to estimate well the proportion of signal variance in the total variance of neural activity measurements. This explainable variance measure helps neuroscientists assess the adequacy of predictive models that describe how images are encoded in the brain. Complicating the estimation problem are strong noise correlations, which may confound the neural responses corresponding to the stimuli. If not properly taken into account, the correlations could inflate the explainable variance estimates and suggest false possible prediction accuracies. We propose a novel method to estimate the explainable variance in functional MRI (fMRI) brain activity measurements when there are strong correlations in the noise. Our shuffle estimator is nonparametric, unbiased, and built upon the random effect model reflecting the randomization in the fMRI data collection process. Leveraging symmetries in the measurements, our estimator is obtained by appropriately permuting the measurement vector in such a way that the noise covariance structure is intact but the explainable variance is changed after the permutation. This difference is then used to estimate the explainable variance. We validate the properties of the proposed method in simulation experiments. For the image-fMRI data, we show that the shuffle estimates can explain the variation in prediction accuracy for voxels within the primary visual cortex (V1) better than alternative parametric methods.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS681 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Number of paths versus number of basis functions in American option pricing

    Full text link
    An American option grants the holder the right to select the time at which to exercise the option, so pricing an American option entails solving an optimal stopping problem. Difficulties in applying standard numerical methods to complex pricing problems have motivated the development of techniques that combine Monte Carlo simulation with dynamic programming. One class of methods approximates the option value at each time using a linear combination of basis functions, and combines Monte Carlo with backward induction to estimate optimal coefficients in each approximation. We analyze the convergence of such a method as both the number of basis functions and the number of simulated paths increase. We get explicit results when the basis functions are polynomials and the underlying process is either Brownian motion or geometric Brownian motion. We show that the number of paths required for worst-case convergence grows exponentially in the degree of the approximating polynomials in the case of Brownian motion and faster in the case of geometric Brownian motion.Comment: Published at http://dx.doi.org/10.1214/105051604000000846 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org