13,324 research outputs found

    Towards Machine Wald

    Get PDF
    The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.Comment: 37 page

    A One-Sample Test for Normality with Kernel Methods

    Get PDF
    We propose a new one-sample test for normality in a Reproducing Kernel Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a given family of Gaussian distributions. Hence our procedure may be applied either to test data for normality or to test parameters (mean and covariance) if data are assumed Gaussian. Our test is based on the same principle as the MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such as homogeneity or independence testing. Our method makes use of a special kind of parametric bootstrap (typical of goodness-of-fit tests) which is computationally more efficient than standard parametric bootstrap. Moreover, an upper bound for the Type-II error highlights the dependence on influential quantities. Experiments illustrate the practical improvement allowed by our test in high-dimensional settings where common normality tests are known to fail. We also consider an application to covariance rank selection through a sequential procedure

    Nonparametric inference on L\'evy measures and copulas

    Full text link
    In this paper nonparametric methods to assess the multivariate L\'{e}vy measure are introduced. Starting from high-frequency observations of a L\'{e}vy process X\mathbf{X}, we construct estimators for its tail integrals and the Pareto-L\'{e}vy copula and prove weak convergence of these estimators in certain function spaces. Given n observations of increments over intervals of length Δn\Delta_n, the rate of convergence is kn1/2k_n^{-1/2} for kn=nΔnk_n=n\Delta_n which is natural concerning inference on the L\'{e}vy measure. Besides extensions to nonequidistant sampling schemes analytic properties of the Pareto-L\'{e}vy copula which, to the best of our knowledge, have not been mentioned before in the literature are provided as well. We conclude with a short simulation study on the performance of our estimators and apply them to real data.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1116 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Large-Scale Kernel Methods for Independence Testing

    Get PDF
    Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.Comment: 29 pages, 6 figure

    Minimax and Adaptive Inference in Nonparametric Function Estimation

    Get PDF
    Since Stein's 1956 seminal paper, shrinkage has played a fundamental role in both parametric and nonparametric inference. This article discusses minimaxity and adaptive minimaxity in nonparametric function estimation. Three interrelated problems, function estimation under global integrated squared error, estimation under pointwise squared error, and nonparametric confidence intervals, are considered. Shrinkage is pivotal in the development of both the minimax theory and the adaptation theory. While the three problems are closely connected and the minimax theories bear some similarities, the adaptation theories are strikingly different. For example, in a sharp contrast to adaptive point estimation, in many common settings there do not exist nonparametric confidence intervals that adapt to the unknown smoothness of the underlying function. A concise account of these theories is given. The connections as well as differences among these problems are discussed and illustrated through examples.Comment: Published in at http://dx.doi.org/10.1214/11-STS355 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore