333 research outputs found

    New developments of the goodness-of-fit Statistical Toolkit

    Get PDF
    The Statistical Toolkit is a project for the development of open source software tools for statistical data analysis in experimental particle and nuclear physics. The second development cycle encompassed an extension of the software functionality and new tools to facilitate its usage in experimental environments. The new developments include additional goodness-of-fit tests, new implementations of existing tests to improve their statistical precision or computational performance, a new component to extend the usability of the toolkit with other data analysis systems, and new tools for an easier configuration and build of the system in the user's computing environment. The computational performance of all the algorithms implemented has been studied

    Testing marginal homogeneity in Hilbert spaces with applications to stock market returns

    Get PDF
    This paper considers a paired data framework and discusses the question of marginal homogeneity of bivariate high-dimensional or functional data. The related testing problem can be endowed into a more general setting for paired random variables taking values in a general Hilbert space. To address this problem, a Cramér–von-Mises type test statistic is applied and a bootstrap procedure is suggested to obtain critical values and finally a consistent test. The desired properties of a bootstrap test can be derived that are asymptotic exactness under the null hypothesis and consistency under alternatives. Simulations show the quality of the test in the finite sample case. A possible application is the comparison of two possibly dependent stock market returns based on functional data. The approach is demonstrated based on historical data for different stock market indices

    Maximum Fidelity

    Full text link
    The most fundamental problem in statistics is the inference of an unknown probability distribution from a finite number of samples. For a specific observed data set, answers to the following questions would be desirable: (1) Estimation: Which candidate distribution provides the best fit to the observed data?, (2) Goodness-of-fit: How concordant is this distribution with the observed data?, and (3) Uncertainty: How concordant are other candidate distributions with the observed data? A simple unified approach for univariate data that addresses these traditionally distinct statistical notions is presented called "maximum fidelity". Maximum fidelity is a strict frequentist approach that is fundamentally based on model concordance with the observed data. The fidelity statistic is a general information measure based on the coordinate-independent cumulative distribution and critical yet previously neglected symmetry considerations. An approximation for the null distribution of the fidelity allows its direct conversion to absolute model concordance (p value). Fidelity maximization allows identification of the most concordant model distribution, generating a method for parameter estimation, with neighboring, less concordant distributions providing the "uncertainty" in this estimate. Maximum fidelity provides an optimal approach for parameter estimation (superior to maximum likelihood) and a generally optimal approach for goodness-of-fit assessment of arbitrary models applied to univariate data. Extensions to binary data, binned data, multidimensional data, and classical parametric and nonparametric statistical tests are described. Maximum fidelity provides a philosophically consistent, robust, and seemingly optimal foundation for statistical inference. All findings are presented in an elementary way to be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte

    Modeling diameter distributions with six probability density functions in Pinus halepensis Mill. Plantations using low-density airborne laser scanning data in AragĂłn (northeast Spain)

    Get PDF
    Producción CientíficaThe diameter distributions of trees in 50 temporary sample plots (TSPs) established in Pinus halepensis Mill. stands were recovered from LiDAR metrics by using six probability density functions (PDFs): the Weibull (2P and 3P), Johnson’s SB, beta, generalized beta and gamma-2P functions. The parameters were recovered from the first and the second moments of the distributions (mean and variance, respectively) by using parameter recovery models (PRM). Linear models were used to predict both moments from LiDAR data. In recovering the functions, the location parameters of the distributions were predetermined as the minimum diameter inventoried, and scale parameters were established as the maximum diameters predicted from LiDAR metrics. The Kolmogorov–Smirnov (KS) statistic (Dn), number of acceptances by the KS test, the Cramér von Misses (W2) statistic, bias and mean square error (MSE) were used to evaluate the goodness of fits. The fits for the six recovered functions were compared with the fits to all measured data from 58 TSPs (LiDAR metrics could only be extracted from 50 of the plots). In the fitting phase, the location parameters were fixed at a suitable value determined according to the forestry literature (0.75·dmin). The linear models used to recover the two moments of the distributions and the maximum diameters determined from LiDAR data were accurate, with R2 values of 0.750, 0.724 and 0.873 for dg, dmed and dmax. Reasonable results were obtained with all six recovered functions. The goodness-of-fit statistics indicated that the beta function was the most accurate, followed by the generalized beta function. The Weibull-3P function provided the poorest fits and the Weibull-2P and Johnson’s SB also yielded poor fits to the data.Ministerio de Economía, Industria y Competitividad, Ayudas Torres Quevedo- (grant PTQ-16-08445)Fondo Europeo Agrario de Desarrollo Rural (FEADER) Programa de Desarrollo Rural de Aragón 2014-2020 - (project RF-64079

    Jackknife empirical likelihood tests for error distributions in regression models

    Get PDF
    AbstractRegression models are commonly used to model the relationship between responses and covariates. For testing the error distribution, some classical test statistics such as Kolmogorov–Smirnov test and Cramér–von-Mises test suffer from the complicated limiting distribution due to the plug-in estimate for the unknown parameters. Hence some ad hoc procedure such as bootstrap method is needed to obtain critical points. Recently, Khmaladze and Koul (2004) [7] have proposed an asymptotically distribution free test via some Martingale transforms. However, the calculation of such a test becomes quite involved, which usually requires numeric integration when the Cramér–von-Mises type of test is employed. In this paper we propose a novel jackknife empirical likelihood method which is easy to compute and has a chi-square limit so that critical values are ready at hand. A simulation study confirms that the new test has an accurate size and is powerful too

    THE TWO-SAMPLE PROBLEM WITH REGRESSION ERRORS: AN EMPIRICAL PROCESS APPROACH

    Get PDF
    We describe how to test the null hypothesis that errors from two parametrically specified regression models have the same distribution versus a general alternative. First we obtain the asymptotic properties of test-statistics derived from the difference between the two residual-based empirical distribution functions. Under the null distribution they are not asymptotically distribution free and, hence, a consistent bootstrap procedure is proposed to compute critical values. As an alternative, we describe how to perform the test with statistics based on martingale-transformed empirical processes, which are asymptotically distribution free. Some Monte Carlo experiments are performed to compare the behaviour of all statistics with moderate sample sizes.Two-Sample Problem; Residual-Based Empirical Process; Smooth Bootstrap; Martingale Transform

    Asymptotic properties of a goodness-of-fit test based on maximum correlations

    Get PDF
    We study the efficiency properties of the goodness-of-fit test based on the Qn statistic introduced in Fortiana and Grané (2003) using the concepts of Bahadur asymptotic relative efficiency and Bahadur asymptotic optimality. We compare the test based on this statistic with those based on the Kolmogorov-Smirnov, the Cramér-von Mises and the Anderson-Darling statistics. We also describe the distribution families for which the test based on Qn is asymptotically optimal in the Bahadur sense and, as an application, we use this test to detect the presence of hidden periodicities in a stationary time series
    • …
    corecore