20,701 research outputs found

    Maximum Fidelity

    Full text link
    The most fundamental problem in statistics is the inference of an unknown probability distribution from a finite number of samples. For a specific observed data set, answers to the following questions would be desirable: (1) Estimation: Which candidate distribution provides the best fit to the observed data?, (2) Goodness-of-fit: How concordant is this distribution with the observed data?, and (3) Uncertainty: How concordant are other candidate distributions with the observed data? A simple unified approach for univariate data that addresses these traditionally distinct statistical notions is presented called "maximum fidelity". Maximum fidelity is a strict frequentist approach that is fundamentally based on model concordance with the observed data. The fidelity statistic is a general information measure based on the coordinate-independent cumulative distribution and critical yet previously neglected symmetry considerations. An approximation for the null distribution of the fidelity allows its direct conversion to absolute model concordance (p value). Fidelity maximization allows identification of the most concordant model distribution, generating a method for parameter estimation, with neighboring, less concordant distributions providing the "uncertainty" in this estimate. Maximum fidelity provides an optimal approach for parameter estimation (superior to maximum likelihood) and a generally optimal approach for goodness-of-fit assessment of arbitrary models applied to univariate data. Extensions to binary data, binned data, multidimensional data, and classical parametric and nonparametric statistical tests are described. Maximum fidelity provides a philosophically consistent, robust, and seemingly optimal foundation for statistical inference. All findings are presented in an elementary way to be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte

    Chi-squared tests of interval and density forecasts and the Bank of England's fan charts

    Get PDF
    This paper reviews recently proposed likelihood ratio tests of goodness-of-fit and independence of interval forecasts. It recasts them in the framework of Pearson chi-squared statistics, and considers their extension to density forecasts and their exact small-sample distributions. The use of the familiar framework of contingency tables will increase the accessibility of these methods. The tests are applied to two series of density forecasts of inflation, namely the US Survey of Professional Forecasters and the Bank of England fan charts. This first evaluation of the fan chart forecasts finds that whereas the current-quarter forecasts are well-calibrated, this is less true of the one-year-ahead forecasts. The fan charts fan out too quickly, and the excessive concern with the upside risks was not justified over the period considered JEL Classification: C53, E37interval and density forecasts

    Validation Test of Geant4 Simulation of Electron Backscattering

    Full text link
    Backscattering is a sensitive probe of the accuracy of electron scattering algorithms implemented in Monte Carlo codes. The capability of the Geant4 toolkit to describe realistically the fraction of electrons backscattered from a target volume is extensively and quantitatively evaluated in comparison with experimental data retrieved from the literature. The validation test covers the energy range between approximately 100 eV and 20 MeV, and concerns a wide set of target elements. Multiple and single electron scattering models implemented in Geant4, as well as preassembled selections of physics models distributed within Geant4, are analyzed with statistical methods. The evaluations concern Geant4 versions from 9.1 to 10.1. Significant evolutions are observed over the range of Geant4 versions, not always in the direction of better compatibility with experiment. Goodness-of-fit tests complemented by categorical analysis tests identify a configuration based on Geant4 Urban multiple scattering model in Geant4 version 9.1 and a configuration based on single Coulomb scattering in Geant4 10.0 as the physics options best reproducing experimental data above a few tens of keV. At lower energies only single scattering demonstrates some capability to reproduce data down to a few keV. Recommended preassembled physics configurations appear incapable of describing electron backscattering compatible with experiment. With the support of statistical methods, a correlation is established between the validation of Geant4-based simulation of backscattering and of energy deposition

    Adaptive goodness-of-fit tests in a density model

    Full text link
    Given an i.i.d. sample drawn from a density ff, we propose to test that ff equals some prescribed density f0f_0 or that ff belongs to some translation/scale family. We introduce a multiple testing procedure based on an estimation of the L2\mathbb{L}_2-distance between ff and f0f_0 or between ff and the parametric family that we consider. For each sample size nn, our test has level of significance α\alpha. In the case of simple hypotheses, we prove that our test is adaptive: it achieves the optimal rates of testing established by Ingster [J. Math. Sci. 99 (2000) 1110--1119] over various classes of smooth functions simultaneously. As for composite hypotheses, we obtain similar results up to a logarithmic factor. We carry out a simulation study to compare our procedures with the Kolmogorov--Smirnov tests, or with goodness-of-fit tests proposed by Bickel and Ritov [in Nonparametric Statistics and Related Topics (1992) 51--57] and by Kallenberg and Ledwina [Ann. Statist. 23 (1995) 1594--1608].Comment: Published at http://dx.doi.org/10.1214/009053606000000119 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A common goodness-of-fit framework for neural population models using marked point process time-rescaling

    Get PDF
    A critical component of any statistical modeling procedure is the ability to assess the goodness-of-fit between a model and observed data. For spike train models of individual neurons, many goodness-of-fit measures rely on the time-rescaling theorem and assess model quality using rescaled spike times. Recently, there has been increasing interest in statistical models that describe the simultaneous spiking activity of neuron populations, either in a single brain region or across brain regions. Classically, such models have used spike sorted data to describe relationships between the identified neurons, but more recently clusterless modeling methods have been used to describe population activity using a single model. Here we develop a generalization of the time-rescaling theorem that enables comprehensive goodness-of-fit analysis for either of these classes of population models. We use the theory of marked point processes to model population spiking activity, and show that under the correct model, each spike can be rescaled individually to generate a uniformly distributed set of events in time and the space of spike marks. After rescaling, multiple well-established goodness-of-fit procedures and statistical tests are available. We demonstrate the application of these methods both to simulated data and real population spiking in rat hippocampus. We have made the MATLAB and Python code used for the analyses in this paper publicly available through our Github repository at https://github.com/Eden-Kramer-Lab/popTRT.This work was supported by grants from the NIH (MH105174, NS094288) and the Simons Foundation (542971). (MH105174 - NIH; NS094288 - NIH; 542971 - Simons Foundation)Published versio

    Chi-squared tests of interval and density forecasts, and the Bank of England's fan charts

    Get PDF
    This paper reviews recently proposed likelihood ratio tests of goodness-of-fit and independence of interval forecasts. It recasts them in the framework of Pearson chi-squared statistics, and extends them to density forecasts. Two further recent developments are also incorporated, namely a more informative decomposition of the goodness-of-fit statistic, and the calculation of exact P-values. Examples considered are the US Survey of Professional Forecasters density forecasts of inflation and the Bank of England fan charts. This first evaluation of the Bank forecasts finds that the fan charts fan out too quickly, and the excessive concern with the upside risks was not justified.

    Fitting Effective Diffusion Models to Data Associated with a "Glassy Potential": Estimation, Classical Inference Procedures and Some Heuristics

    Full text link
    A variety of researchers have successfully obtained the parameters of low dimensional diffusion models using the data that comes out of atomistic simulations. This naturally raises a variety of questions about efficient estimation, goodness-of-fit tests, and confidence interval estimation. The first part of this article uses maximum likelihood estimation to obtain the parameters of a diffusion model from a scalar time series. I address numerical issues associated with attempting to realize asymptotic statistics results with moderate sample sizes in the presence of exact and approximated transition densities. Approximate transition densities are used because the analytic solution of a transition density associated with a parametric diffusion model is often unknown.I am primarily interested in how well the deterministic transition density expansions of Ait-Sahalia capture the curvature of the transition density in (idealized) situations that occur when one carries out simulations in the presence of a "glassy" interaction potential. Accurate approximation of the curvature of the transition density is desirable because it can be used to quantify the goodness-of-fit of the model and to calculate asymptotic confidence intervals of the estimated parameters. The second part of this paper contributes a heuristic estimation technique for approximating a nonlinear diffusion model. A "global" nonlinear model is obtained by taking a batch of time series and applying simple local models to portions of the data. I demonstrate the technique on a diffusion model with a known transition density and on data generated by the Stochastic Simulation Algorithm.Comment: 30 pages 10 figures Submitted to SIAM MMS (typos removed and slightly shortened
    • …
    corecore