333 research outputs found
New developments of the goodness-of-fit Statistical Toolkit
The Statistical Toolkit is a project for the development of open source software tools for statistical data analysis in experimental particle and nuclear physics. The second development cycle encompassed an extension of the software functionality and new tools to facilitate its usage in experimental environments. The new developments include additional goodness-of-fit tests, new implementations of existing tests to improve their statistical precision or computational performance, a new component to extend the usability of the toolkit with other data analysis systems, and new tools for an easier configuration and build of the system in the user's computing environment. The computational performance of all the algorithms implemented has been studied
Testing marginal homogeneity in Hilbert spaces with applications to stock market returns
This paper considers a paired data framework and discusses the question of marginal homogeneity of bivariate high-dimensional or functional data. The related testing problem can be endowed into a more general setting for paired random variables taking values in a general Hilbert space. To address this problem, a Cramér–von-Mises type test statistic is applied and a bootstrap procedure is suggested to obtain critical values and finally a consistent test. The desired properties of a bootstrap test can be derived that are asymptotic exactness under the null hypothesis and consistency under alternatives. Simulations show the quality of the test in the finite sample case. A possible application is the comparison of two possibly dependent stock market returns based on functional data. The approach is demonstrated based on historical data for different stock market indices
Maximum Fidelity
The most fundamental problem in statistics is the inference of an unknown
probability distribution from a finite number of samples. For a specific
observed data set, answers to the following questions would be desirable: (1)
Estimation: Which candidate distribution provides the best fit to the observed
data?, (2) Goodness-of-fit: How concordant is this distribution with the
observed data?, and (3) Uncertainty: How concordant are other candidate
distributions with the observed data? A simple unified approach for univariate
data that addresses these traditionally distinct statistical notions is
presented called "maximum fidelity". Maximum fidelity is a strict frequentist
approach that is fundamentally based on model concordance with the observed
data. The fidelity statistic is a general information measure based on the
coordinate-independent cumulative distribution and critical yet previously
neglected symmetry considerations. An approximation for the null distribution
of the fidelity allows its direct conversion to absolute model concordance (p
value). Fidelity maximization allows identification of the most concordant
model distribution, generating a method for parameter estimation, with
neighboring, less concordant distributions providing the "uncertainty" in this
estimate. Maximum fidelity provides an optimal approach for parameter
estimation (superior to maximum likelihood) and a generally optimal approach
for goodness-of-fit assessment of arbitrary models applied to univariate data.
Extensions to binary data, binned data, multidimensional data, and classical
parametric and nonparametric statistical tests are described. Maximum fidelity
provides a philosophically consistent, robust, and seemingly optimal foundation
for statistical inference. All findings are presented in an elementary way to
be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte
Modeling diameter distributions with six probability density functions in Pinus halepensis Mill. Plantations using low-density airborne laser scanning data in AragĂłn (northeast Spain)
ProducciĂłn CientĂficaThe diameter distributions of trees in 50 temporary sample plots (TSPs) established in Pinus halepensis Mill. stands were recovered from LiDAR metrics by using six probability density functions (PDFs): the Weibull (2P and 3P), Johnson’s SB, beta, generalized beta and gamma-2P functions. The parameters were recovered from the first and the second moments of the distributions (mean and variance, respectively) by using parameter recovery models (PRM). Linear models were used to predict both moments from LiDAR data. In recovering the functions, the location parameters of the distributions were predetermined as the minimum diameter inventoried, and scale parameters were established as the maximum diameters predicted from LiDAR metrics. The Kolmogorov–Smirnov (KS) statistic (Dn), number of acceptances by the KS test, the CramĂ©r von Misses (W2) statistic, bias and mean square error (MSE) were used to evaluate the goodness of fits. The fits for the six recovered functions were compared with the fits to all measured data from 58 TSPs (LiDAR metrics could only be extracted from 50 of the plots). In the fitting phase, the location parameters were fixed at a suitable value determined according to the forestry literature (0.75·dmin). The linear models used to recover the two moments of the distributions and the maximum diameters determined from LiDAR data were accurate, with R2 values of 0.750, 0.724 and 0.873 for dg, dmed and dmax. Reasonable results were obtained with all six recovered functions. The goodness-of-fit statistics indicated that the beta function was the most accurate, followed by the generalized beta function. The Weibull-3P function provided the poorest fits and the Weibull-2P and Johnson’s SB also yielded poor fits to the data.Ministerio de EconomĂa, Industria y Competitividad, Ayudas Torres Quevedo- (grant PTQ-16-08445)Fondo Europeo Agrario de Desarrollo Rural (FEADER) Programa de Desarrollo Rural de AragĂłn 2014-2020 - (project RF-64079
Jackknife empirical likelihood tests for error distributions in regression models
AbstractRegression models are commonly used to model the relationship between responses and covariates. For testing the error distribution, some classical test statistics such as Kolmogorov–Smirnov test and Cramér–von-Mises test suffer from the complicated limiting distribution due to the plug-in estimate for the unknown parameters. Hence some ad hoc procedure such as bootstrap method is needed to obtain critical points. Recently, Khmaladze and Koul (2004) [7] have proposed an asymptotically distribution free test via some Martingale transforms. However, the calculation of such a test becomes quite involved, which usually requires numeric integration when the Cramér–von-Mises type of test is employed. In this paper we propose a novel jackknife empirical likelihood method which is easy to compute and has a chi-square limit so that critical values are ready at hand. A simulation study confirms that the new test has an accurate size and is powerful too
THE TWO-SAMPLE PROBLEM WITH REGRESSION ERRORS: AN EMPIRICAL PROCESS APPROACH
We describe how to test the null hypothesis that errors from two parametrically specified regression models have the same distribution versus a general alternative. First we obtain the asymptotic properties of test-statistics derived from the difference between the two residual-based empirical distribution functions. Under the null distribution they are not asymptotically distribution free and, hence, a consistent bootstrap procedure is proposed to compute critical values. As an alternative, we describe how to perform the test with statistics based on martingale-transformed empirical processes, which are asymptotically distribution free. Some Monte Carlo experiments are performed to compare the behaviour of all statistics with moderate sample sizes.Two-Sample Problem; Residual-Based Empirical Process; Smooth Bootstrap; Martingale Transform
Asymptotic properties of a goodness-of-fit test based on maximum correlations
We study the efficiency properties of the goodness-of-fit test based on the Qn statistic
introduced in Fortiana and Grané (2003) using the concepts of Bahadur asymptotic relative
efficiency and Bahadur asymptotic optimality. We compare the test based on this statistic with
those based on the Kolmogorov-Smirnov, the Cramér-von Mises and the Anderson-Darling
statistics. We also describe the distribution families for which the test based on Qn is
asymptotically optimal in the Bahadur sense and, as an application, we use this test to detect the
presence of hidden periodicities in a stationary time series
- …