146,061 research outputs found
Nonparametric Regression, Confidence Regions and Regularization
In this paper we offer a unified approach to the problem of nonparametric
regression on the unit interval. It is based on a universal, honest and
non-asymptotic confidence region which is defined by a set of linear
inequalities involving the values of the functions at the design points.
Interest will typically centre on certain simplest functions in that region
where simplicity can be defined in terms of shape (number of local extremes,
intervals of convexity/concavity) or smoothness (bounds on derivatives) or a
combination of both. Once some form of regularization has been decided upon the
confidence region can be used to provide honest non-asymptotic confidence
bounds which are less informative but conceptually much simpler
Extreme Value Theory and the Solar Cycle
We investigate the statistical properties of the extreme events of the solar
cycle as measured by the sunspot number. The recent advances in the methodology
of the theory of extreme values is applied to the maximal extremes of the time
series of sunspots. We focus on the extreme events that exceed a carefully
chosen threshold and a generalized Pareto distribution is fitted to the tail of
the empirical cumulative distribution. A maximum likelihood method is used to
estimate the parameters of the generalized Pareto distribution and confidence
levels are also given to the parameters. Due to the lack of an automatic
procedure for selecting the threshold, we analyze the sensitivity of the fitted
generalized Pareto distribution to the exact value of the threshold. According
to the available data, that only spans the previous ~250 years, the cumulative
distribution of the time series is bounded, yielding an upper limit of 324 for
the sunspot number. We also estimate that the return value for each solar cycle
is ~188, while the return value for a century increases to ~228. Finally, the
results also indicate that the most probable return time for a large event like
the maximum at solar cycle 19 happens once every ~700 years and that the
probability of finding such a large event with a frequency smaller than ~50
years is very small. In spite of the essentially extrapolative character of
these results, their statistical significance is very large.Comment: 6 pages, 4 figures, accepted for publication in A&
Limits and Confidence Intervals in the Presence of Nuisance Parameters
We study the frequentist properties of confidence intervals computed by the
method known to statisticians as the Profile Likelihood. It is seen that the
coverage of these intervals is surprisingly good over a wide range of possible
parameter values for important classes of problems, in particular whenever
there are additional nuisance parameters with statistical or systematic errors.
Programs are available for calculating these intervals.Comment: 6 figure
Model misspecification in peaks over threshold analysis
Classical peaks over threshold analysis is widely used for statistical
modeling of sample extremes, and can be supplemented by a model for the sizes
of clusters of exceedances. Under mild conditions a compound Poisson process
model allows the estimation of the marginal distribution of threshold
exceedances and of the mean cluster size, but requires the choice of a
threshold and of a run parameter, , that determines how exceedances are
declustered. We extend a class of estimators of the reciprocal mean cluster
size, known as the extremal index, establish consistency and asymptotic
normality, and use the compound Poisson process to derive misspecification
tests of model validity and of the choice of run parameter and threshold.
Simulated examples and real data on temperatures and rainfall illustrate the
ideas, both for estimating the extremal index in nonstandard situations and for
assessing the validity of extremal models.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS292 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Quantifying statistical uncertainty in the attribution of human influence on severe weather
Event attribution in the context of climate change seeks to understand the
role of anthropogenic greenhouse gas emissions on extreme weather events,
either specific events or classes of events. A common approach to event
attribution uses climate model output under factual (real-world) and
counterfactual (world that might have been without anthropogenic greenhouse gas
emissions) scenarios to estimate the probabilities of the event of interest
under the two scenarios. Event attribution is then quantified by the ratio of
the two probabilities. While this approach has been applied many times in the
last 15 years, the statistical techniques used to estimate the risk ratio based
on climate model ensembles have not drawn on the full set of methods available
in the statistical literature and have in some cases used and interpreted the
bootstrap method in non-standard ways. We present a precise frequentist
statistical framework for quantifying the effect of sampling uncertainty on
estimation of the risk ratio, propose the use of statistical methods that are
new to event attribution, and evaluate a variety of methods using statistical
simulations. We conclude that existing statistical methods not yet in use for
event attribution have several advantages over the widely-used bootstrap,
including better statistical performance in repeated samples and robustness to
small estimated probabilities. Software for using the methods is available
through the climextRemes package available for R or Python. While we focus on
frequentist statistical methods, Bayesian methods are likely to be particularly
useful when considering sources of uncertainty beyond sampling uncertainty.Comment: 41 pages, 11 figures, 1 tabl
Extreme Value Statistics of the Total Energy in an Intermediate Complexity Model of the Mid-latitude Atmospheric Jet. Part I: Stationary case
A baroclinic model for the atmospheric jet at middle-latitudes is used as a
stochastic generator of time series of the total energy of the system.
Statistical inference of extreme values is applied to yearly maxima sequences
of the time series, in the rigorous setting provided by extreme value theory.
In particular, the Generalized Extreme Value (GEV) family of distributions is
used here. Several physically realistic values of the parameter ,
descriptive of the forced equator-to-pole temperature gradient and responsible
for setting the average baroclinicity in the atmospheric model, are examined.
The location and scale GEV parameters are found to have a piecewise smooth,
monotonically increasing dependence on . This is in agreement with the
similar dependence on observed in the same system when other dynamically
and physically relevant observables are considered. The GEV shape parameter
also increases with but is always negative, as \textit{a priori} required
by the boundedness of the total energy of the system. The sensitivity of the
statistical inference process is studied with respect to the selection
procedure of the maxima: the roles of both the length of maxima sequences and
of the length of data blocks over which the maxima are computed are critically
analyzed. Issues related to model sensitivity are also explored by varying the
resolution of the system
- …