173,990 research outputs found
Recommended from our members
Robustness of the One-Sample Kolmogorov Test to Sampling from a Finite Discrete Population
One of the most useful and best known goodness of fit test is the Kolmogorov one-sample test. The assumptions for the Kolmogorov (one-sample test) test are: 1. A random sample; 2. A continuous random variable; 3. F(x) is a completely specified hypothesized cumulative distribution function. The Kolmogorov one-sample test has a wide range of applications. Knowing the effect fromusing the test when an assumption is not met is of practical importance. The purpose of this research is to analyze the robustness of the Kolmogorov one-sample test to sampling from a finite discrete distribution. The standard tables for the Kolmogorov test are derived based on sampling from a theoretical continuous distribution. As such, the theoretical distribution is infinite. The standard tables do not include a method or adjustment factor to estimate the effect on table values for statistical experiments where the sample stems from a finite discrete distribution without replacement. This research provides an extension of the Kolmogorov test when the hypothesized distribution function is finite and discrete, and the sampling distribution is based on sampling without replacement. An investigative study has been conducted to explore possible tendencies and relationships in the distribution of Dn when sampling with and without replacement for various parameter settings. In all, 96 sampling distributions were derived. Results show the standard Kolmogorov table values are conservative, particularly when the sample sizes are small or the sample represents 10% or more of the population
Power-law distributions in empirical data
Power-law distributions occur in many situations of scientific interest and
have significant consequences for our understanding of natural and man-made
phenomena. Unfortunately, the detection and characterization of power laws is
complicated by the large fluctuations that occur in the tail of the
distribution -- the part of the distribution representing large but rare events
-- and by the difficulty of identifying the range over which power-law behavior
holds. Commonly used methods for analyzing power-law data, such as
least-squares fitting, can produce substantially inaccurate estimates of
parameters for power-law distributions, and even in cases where such methods
return accurate answers they are still unsatisfactory because they give no
indication of whether the data obey a power law at all. Here we present a
principled statistical framework for discerning and quantifying power-law
behavior in empirical data. Our approach combines maximum-likelihood fitting
methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic
and likelihood ratios. We evaluate the effectiveness of the approach with tests
on synthetic data and give critical comparisons to previous approaches. We also
apply the proposed methods to twenty-four real-world data sets from a range of
different disciplines, each of which has been conjectured to follow a power-law
distribution. In some cases we find these conjectures to be consistent with the
data while in others the power law is ruled out.Comment: 43 pages, 11 figures, 7 tables, 4 appendices; code available at
http://www.santafe.edu/~aaronc/powerlaws
Connecting the Dots: Towards Continuous Time Hamiltonian Monte Carlo
Continuous time Hamiltonian Monte Carlo is introduced, as a powerful
alternative to Markov chain Monte Carlo methods for continuous target
distributions. The method is constructed in two steps: First Hamiltonian
dynamics are chosen as the deterministic dynamics in a continuous time
piecewise deterministic Markov process. Under very mild restrictions, such a
process will have the desired target distribution as an invariant distribution.
Secondly, the numerical implementation of such processes, based on adaptive
numerical integration of second order ordinary differential equations is
considered. The numerical implementation yields an approximate, yet highly
robust algorithm that, unlike conventional Hamiltonian Monte Carlo, enables the
exploitation of the complete Hamiltonian trajectories (hence the title). The
proposed algorithm may yield large speedups and improvements in stability
relative to relevant benchmarks, while incurring numerical errors that are
negligible relative to the overall Monte Carlo errors
- …