4,010 research outputs found
Characterization of the frequency of extreme events by the Generalized Pareto Distribution
Based on recent results in extreme value theory, we use a new technique for
the statistical estimation of distribution tails. Specifically, we use the
Gnedenko-Pickands-Balkema-de Haan theorem, which gives a natural limit law for
peak-over-threshold values in the form of the Generalized Pareto Distribution
(GPD). Useful in finance, insurance, hydrology, we investigate here the
earthquake energy distribution described by the Gutenberg-Richter seismic
moment-frequency law and analyze shallow earthquakes (depth h < 70 km) in the
Harvard catalog over the period 1977-2000 in 18 seismic zones. The whole GPD is
found to approximate the tails of the seismic moment distributions quite well
above moment-magnitudes larger than mW=5.3 and no statistically significant
regional difference is found for subduction and transform seismic zones. We
confirm that the b-value is very different in mid-ocean ridges compared to
other zones (b=1.50=B10.09 versus b=1.00=B10.05 corresponding to a power law
exponent close to 1 versus 2/3) with a very high statistical confidence. We
propose a physical mechanism for this, contrasting slow healing ruptures in
mid-ocean ridges with fast healing ruptures in other zones. Deviations from the
GPD at the very end of the tail are detected in the sample containing
earthquakes from all major subduction zones (sample size of 4985 events). We
propose a new statistical test of significance of such deviations based on the
bootstrap method. The number of events deviating from the tails of GPD in the
studied data sets (15-20 at most) is not sufficient for determining the
functional form of those deviations. Thus, it is practically impossible to give
preference to one of the previously suggested parametric families describing
the ends of tails of seismic moment distributions.Comment: pdf document of 21 pages + 2 tables + 20 figures (ps format) + one
file giving the regionalizatio
Applications of threshold models and the weighted bootstrap for Hungarian precipitation data
This paper presents applications of the peaks-over threshold methodology for
both the univariate and the recently introduced bivariate case, combined with a
novel bootstrap approach. We compare the proposed bootstrap methods to the more
traditional profile likelihood.
We have investigated 63 years of the European Climate Assessment daily
precipitation data for five Hungarian grid points, first separately for the
summer and winter months, then aiming at the detection of possible changes by
investigating 20 years moving windows. We show that significant changes can be
observed both in the univariate and the bivariate cases, the most recent period
being the most dangerous, as the return levels here are the highest. We
illustrate these effects by bivariate coverage regions.Comment: 10 pages, 7 figures, 5 table
Extreme value analysis of actuarial risks: estimation and model validation
We give an overview of several aspects arising in the statistical analysis of
extreme risks with actuarial applications in view. In particular it is
demonstrated that empirical process theory is a very powerful tool, both for
the asymptotic analysis of extreme value estimators and to devise tools for the
validation of the underlying model assumptions. While the focus of the paper is
on univariate tail risk analysis, the basic ideas of the analysis of the
extremal dependence between different risks are also outlined. Here we
emphasize some of the limitation of classical multivariate extreme value theory
and sketch how a different model proposed by Ledford and Tawn can help to avoid
pitfalls. Finally, these theoretical results are used to analyze a data set of
large claim sizes from health insurance.Comment: to appear in Advances in Statistical Analysi
Closed-form mathematical expressions for the exponentiated Cauchy-Rayleigh distribution
The Cauchy-Rayleigh (CR) distribution has been successfully used to describe
asymmetric and heavy-tail events from radar imagery. Employing such model to
describe lifetime data may then seem attractive, but some drawbacks arise: its
probability density function does not cover non-modal behavior as well as the
CR hazard rate function (hrf) assumes only one form. To outperform this
difficulty, we introduce an extended CR model, called exponentiated
Cauchy-Rayleigh (ECR) distribution. This model has two parameters and hrf with
decreasing, decreasing-increasing-decreasing and upside-down bathtub forms. In
this paper, several closed-form mathematical expressions for the ECR model are
proposed: median, mode, probability weighted, log-, incomplete and order
statistic moments and Fisher information matrix. We propose three estimation
procedures for the ECR parameters: maximum likelihood (ML), bias corrected ML
and percentile-based methods. A simulation study is done to assess the
performance of estimators. An application to survival time of heart problem
patients illustrates the usefulness of the ECR model. Results point out that
the ECR distribution may outperform classical lifetime models, such as the
gamma, Birnbaun-Saunders, Weibull and log-normal laws, before heavy-tail data.Comment: 30 page
Modeling catastrophic deaths using EVT with a microsimulation approach to reinsurance pricing
Recently, a marked Poisson process (MPP) model for life catastrophe risk was
proposed in [6]. We provide a justification and further support for the model
by considering more general Poisson point processes in the context of extreme
value theory (EVT), and basing the choice of model on statistical tests and
model comparisons. A case study examining accidental deaths in the Finnish
population is provided.
We further extend the applicability of the catastrophe risk model by
considering small and big accidents separately; the resulting combined MPP
model can flexibly capture the whole range of accidental death counts. Using
the proposed model, we present a simulation framework for pricing (life)
catastrophe reinsurance, based on modeling the underlying policies at
individual contract level. The accidents are first simulated at population
level, and their effect on a specific insurance company is then determined by
explicitly simulating the resulting insured deaths. The proposed
microsimulation approach can potentially lead to more accurate results than the
traditional methods, and to a better view of risk, as it can make use of all
the information available to the re/insurer and can explicitly accommodate even
complex re/insurance terms and product features. As an example we price several
excess reinsurance contracts. The proposed simulation model is also suitable
for solvency assessment.Comment: 32 pages, 9 figure
Inference on the Parameters of the Weibull Distribution Using Records
The Weibull distribution is a very applicable model for the lifetime data. In
this paper, we have investigated inference on the parameters of Weibull
distribution based on record values. We first propose a simple and exact test
and a confidence interval for the shape parameter. Then, in addition to a
generalized confidence interval, a generalized test variable is derived for the
scale parameter when the shape parameter is unknown. The paper presents a
simple and exact joint confidence region as well. %for the scale and shape
parameters. In all cases, simulation studies show that the proposed approaches
are more satisfactory and reliable than previous methods. All proposed
approaches are illustrated using a real example.Comment: Accepted for publication in SOR
LiSBOA: LiDAR Statistical Barnes Objective Analysis for optimal design of LiDAR scans and retrieval of wind statistics. Part II: Applications to synthetic and real LiDAR data of wind turbine wakes
The LiDAR Statistical Barnes Objective Analysis (LiSBOA), presented in
Letizia et al., is a procedure for the optimal design of LiDAR scans and
calculation over a Cartesian grid of the statistical moments of the velocity
field. The LiSBOA is applied to LiDAR data collected in the wake of wind
turbines to reconstruct mean and turbulence intensity of the wind velocity
field. The proposed procedure is firstly tested for a numerical dataset
obtained by means of the virtual LiDAR technique applied to the data obtained
from a large eddy simulation (LES). The optimal sampling parameters for a
scanning Doppler pulsed wind LiDAR are retrieved from the LiSBOA, then the
estimated statistics are calculated showing a maximum error of about 4% for
both the normalized mean velocity and the turbulence intensity. Subsequently,
LiDAR data collected during a field campaign conducted at a wind farm in
complex terrain are analyzed through the LiSBOA for two different
configurations. In the first case, the wake velocity fields of four
utility-scale turbines are reconstructed on a 3D grid, showing the capability
of the LiSBOA to capture complex flow features, such as high-speed jet around
the nacelle and the wake turbulent shear layers. For the second case, the
statistics of the wakes generated by four interacting turbines are calculated
over a 2D Cartesian grid and compared to the measurements provided by the
nacelle-mounted anemometers. Maximum discrepancies as low as 3% for the
normalized mean velocity and turbulence intensity endorse the application of
the LiSBOA for LiDAR-based wind resource assessment and diagnostic surveys for
wind farms
Optimal Trade-offs in Multi-Processor Approximate Message Passing
We consider large-scale linear inverse problems in Bayesian settings. We
follow a recent line of work that applies the approximate message passing (AMP)
framework to multi-processor (MP) computational systems, where each processor
node stores and processes a subset of rows of the measurement matrix along with
corresponding measurements. In each MP-AMP iteration, nodes of the MP system
and its fusion center exchange lossily compressed messages pertaining to their
estimates of the input. In this setup, we derive the optimal per-iteration
coding rates using dynamic programming. We analyze the excess mean squared
error (EMSE) beyond the minimum mean squared error (MMSE), and prove that, in
the limit of low EMSE, the optimal coding rates increase approximately linearly
per iteration. Additionally, we obtain that the combined cost of computation
and communication scales with the desired estimation quality according to
. Finally, we study trade-offs between the physical
costs of the estimation process including computation time, communication
loads, and the estimation quality as a multi-objective optimization problem,
and characterize the properties of the Pareto optimal surfaces.Comment: 14 pages, 8 figure
Multivariate Generalized Linear-statistics of short range dependent data
Generalized linear (GL-) statistics are defined as functionals of an
U-quantile process and unify different classes of statistics such as
U-statistics and L-statistics. We derive a central limit theorem for
GL-statistics of strongly mixing sequences and arbitrary dimension of the
underlying kernel. For this purpose we establish a limit theorem for
U-statistics and an invariance principle for U-processes together with a
convergence rate for the remaining term of the Bahadur representation. An
application is given by the generalized median estimator for the tail-parameter
of the Pareto distribution, which is commonly used to model exceedances of high
thresholds. We use subsampling to calculate confidence intervals and
investigate its behaviour under independence and strong mixing in simulations
Confidence regions for high quantiles of a heavy tailed distribution
Estimating high quantiles plays an important role in the context of risk
management. This involves extrapolation of an unknown distribution function. In
this paper we propose three methods, namely, the normal approximation method,
the likelihood ratio method and the data tilting method, to construct
confidence regions for high quantiles of a heavy tailed distribution. A
simulation study prefers the data tilting method.Comment: Published at http://dx.doi.org/10.1214/009053606000000416 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …