46 research outputs found
Smooth tail index estimation
Both parametric distribution functions appearing in extreme value theory -
the generalized extreme value distribution and the generalized Pareto
distribution - have log-concave densities if the extreme value index gamma is
in [-1,0]. Replacing the order statistics in tail index estimators by their
corresponding quantiles from the distribution function that is based on the
estimated log-concave density leads to novel smooth quantile and tail index
estimators. These new estimators aim at estimating the tail index especially in
small samples. Acting as a smoother of the empirical distribution function, the
log-concave distribution function estimator reduces estimation variability to a
much greater extent than it introduces bias. As a consequence, Monte Carlo
simulations demonstrate that the smoothed version of the estimators are well
superior to their non-smoothed counterparts, in terms of mean squared error.Comment: 17 pages, 5 figures. Slightly changed Pickand's estimator, added some
more introduction and discussio
Non-Gaussian component analysis: testing the dimension of the signal subspace
Dimension reduction is a common strategy in multivariate data analysis which
seeks a subspace which contains all interesting features needed for the
subsequent analysis. Non-Gaussian component analysis attempts for this purpose
to divide the data into a non-Gaussian part, the signal, and a Gaussian part,
the noise. We will show that the simultaneous use of two scatter functionals
can be used for this purpose and suggest a bootstrap test to test the dimension
of the non-Gaussian subspace. Sequential application of the test can then for
example be used to estimate the signal dimension
Concentration Inequalities and Confidence Bands for Needlet Density Estimators on Compact Homogeneous Manifolds
Let be a random sample from some unknown probability density
defined on a compact homogeneous manifold of dimension . Consider a 'needlet frame' describing a localised
projection onto the space of eigenfunctions of the Laplace operator on with corresponding eigenvalues less than , as constructed in
\cite{GP10}. We prove non-asymptotic concentration inequalities for the uniform
deviations of the linear needlet density estimator obtained from an
empirical estimate of the needlet projection of . We apply these results to construct risk-adaptive
estimators and nonasymptotic confidence bands for the unknown density . The
confidence bands are adaptive over classes of differentiable and
H\"{older}-continuous functions on that attain their H\"{o}lder
exponents.Comment: Probability Theory and Related Fields, to appea
The Newcomb-Benford Law in Its Relation to Some Common Distributions
An often reported, but nevertheless persistently striking observation, formalized as the Newcomb-Benford law (NBL), is that the frequencies with which the leading digits of numbers occur in a large variety of data are far away from being uniform. Most spectacular seems to be the fact that in many data the leading digit 1 occurs in nearly one third of all cases. Explanations for this uneven distribution of the leading digits were, among others, scale- and base-invariance. Little attention, however, found the interrelation between the distribution of the significant digits and the distribution of the observed variable. It is shown here by simulation that long right-tailed distributions of a random variable are compatible with the NBL, and that for distributions of the ratio of two random variables the fit generally improves. Distributions not putting most mass on small values of the random variable (e.g. symmetric distributions) fail to fit. Hence, the validity of the NBL needs the predominance of small values and, when thinking of real-world data, a majority of small entities. Analyses of data on stock prices, the areas and numbers of inhabitants of countries, and the starting page numbers of papers from a bibliography sustain this conclusion. In all, these findings may help to understand the mechanisms behind the NBL and the conditions needed for its validity. That this law is not only of scientific interest per se, but that, in addition, it has also substantial implications can be seen from those fields where it was suggested to be put into practice. These fields reach from the detection of irregularities in data (e.g. economic fraud) to optimizing the architecture of computers regarding number representation, storage, and round-off errors
logcondens: Computations related to univariate log-concave density estimation
Maximum likelihood estimation of a log-concave density has attracted considerable attention over the last few years. Several algorithms have been proposed to estimate such
a density. Two of those algorithms, an iterative convex minorant and an active set algorithm, are implemented in the R package logcondens. While these algorithms are discussed elsewhere, we describe in this paper the use of the logcondens package and discuss functions
and datasets related to log-concave density estimation contained in the package. In particular, we provide functions to (1) compute the maximum likelihood estimate (MLE) as well as a smoothed log-concave density estimator derived from the MLE, (2) evaluate the estimated density, distribution and quantile functions at arbitrary points, (3) compute the characterizing functions of the MLE, (4) sample from the estimated distribution, and �nally (5) perform a two-sample permutation test using a modi�ed Kolmogorov-Smirnov test statistic. In addition, logcondens makes two datasets available that have been used
to illustrate log-concave density estimation