118,985 research outputs found
A flexible regression model for count data
Poisson regression is a popular tool for modeling count data and is applied
in a vast array of applications from the social to the physical sciences and
beyond. Real data, however, are often over- or under-dispersed and, thus, not
conducive to Poisson regression. We propose a regression model based on the
Conway--Maxwell-Poisson (COM-Poisson) distribution to address this problem. The
COM-Poisson regression generalizes the well-known Poisson and logistic
regression models, and is suitable for fitting count data with a wide range of
dispersion levels. With a GLM approach that takes advantage of exponential
family properties, we discuss model estimation, inference, diagnostics, and
interpretation, and present a test for determining the need for a COM-Poisson
regression over a standard Poisson regression. We compare the COM-Poisson to
several alternatives and illustrate its advantages and usefulness using three
data sets with varying dispersion.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS306 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Spatial modelling of claim frequency and claim size in insurance
In this paper models for claim frequency and claim size in non-life insurance are considered. Both covariates and spatial random e ects are included allowing the modelling of a spatial dependency pattern. We assume a Poisson model for the number of claims, while claim size is modelled using a Gamma distribution. However, in contrast to the usual compound Poisson model going back to Lundberg (1903), we allow for dependencies between claim size and claim frequency. Both models for the individual and average claim sizes of a policyholder are considered. A fully Bayesian approach is followed, parameters are estimated using Markov Chain Monte Carlo (MCMC). The issue of model comparison is thoroughly addressed. Besides the deviance information criterion suggested by Spiegelhalter et al. (2002), the predictive model choice criterion (Gelfand and Ghosh (1998)) and proper scoring rules (Gneiting and Raftery (2005)) based on the posterior predictive distribution are investigated. We give an application to a comprehensive data set from a German car insurance company. The inclusion of spatial e ects significantly improves the models for both claim frequency and claim size and also leads to more accurate predictions of the total claim sizes. Further we quantify the significant number of claims e ects on claim size
Modelling count data with overdispersion and spatial effects
In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. (2002) and using proper scoring rules, see for example Gneiting and Raftery (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, a spatial Poisson model is to be preferred over all other models according to the considered criteria
Histogram comparison as a powerful tool for the search of new physics at LHC. Application to CMSSM
We propose a rigorous and effective way to compare experimental and
theoretical histograms, incorporating the different sources of statistical and
systematic uncertainties. This is a useful tool to extract as much information
as possible from the comparison between experimental data with theoretical
simulations, optimizing the chances of identifying New Physics at the LHC. We
illustrate this by showing how a search in the CMSSM parameter space, using
Bayesian techniques, can effectively find the correct values of the CMSSM
parameters by comparing histograms of events with multijets + missing
transverse momentum displayed in the effective-mass variable. The procedure is
in fact very efficient to identify the true supersymmetric model, in the case
supersymmetry is really there and accessible to the LHC
Image Coaddition with Temporally Varying Kernels
Large, multi-frequency imaging surveys, such as the Large Synaptic Survey
Telescope (LSST), need to do near-real time analysis of very large datasets.
This raises a host of statistical and computational problems where standard
methods do not work. In this paper, we study a proposed method for combining
stacks of images into a single summary image, sometimes referred to as a
template. This task is commonly referred to as image coaddition. In part, we
focus on a method proposed in previous work, which outlines a procedure for
combining stacks of images in an online fashion in the Fourier domain. We
evaluate this method by comparing it to two straightforward methods through the
use of various criteria and simulations. Note that the goal is not to propose
these comparison methods for use in their own right, but to ensure that
additional complexity also provides substantially improved performance
Bayesian nonparametric models for spatially indexed data of mixed type
We develop Bayesian nonparametric models for spatially indexed data of mixed
type. Our work is motivated by challenges that occur in environmental
epidemiology, where the usual presence of several confounding variables that
exhibit complex interactions and high correlations makes it difficult to
estimate and understand the effects of risk factors on health outcomes of
interest. The modeling approach we adopt assumes that responses and confounding
variables are manifestations of continuous latent variables, and uses
multivariate Gaussians to jointly model these. Responses and confounding
variables are not treated equally as relevant parameters of the distributions
of the responses only are modeled in terms of explanatory variables or risk
factors. Spatial dependence is introduced by allowing the weights of the
nonparametric process priors to be location specific, obtained as probit
transformations of Gaussian Markov random fields. Confounding variables and
spatial configuration have a similar role in the model, in that they only
influence, along with the responses, the allocation probabilities of the areas
into the mixture components, thereby allowing for flexible adjustment of the
effects of observed confounders, while allowing for the possibility of residual
spatial structure, possibly occurring due to unmeasured or undiscovered
spatially varying factors. Aspects of the model are illustrated in simulation
studies and an application to a real data set
Chaotic scattering with direct processes: A generalization of Poisson's kernel for non-unitary scattering matrices
The problem of chaotic scattering in presence of direct processes or prompt
responses is mapped via a transformation to the case of scattering in absence
of such processes for non-unitary scattering matrices, \tilde S. In the absence
of prompt responses, \tilde S is uniformly distributed according to its
invariant measure in the space of \tilde S matrices with zero average, < \tilde
S > =0. In the presence of direct processes, the distribution of \tilde S is
non-uniform and it is characterized by the average (\neq 0). In
contrast to the case of unitary matrices S, where the invariant measures of S
for chaotic scattering with and without direct processes are related through
the well known Poisson kernel, here we show that for non-unitary scattering
matrices the invariant measures are related by the Poisson kernel squared. Our
results are relevant to situations where flux conservation is not satisfied.
For example, transport experiments in chaotic systems, where gains or losses
are present, like microwave chaotic cavities or graphs, and acoustic or elastic
resonators.Comment: Added two appendices and references. Corrected typo
Quantifying and containing the curse of high resolution coronal imaging
Future missions such as Solar Orbiter (SO), InterHelioprobe, or Solar Probe
aim at approaching the Sun closer than ever before, with on board some high
resolution imagers (HRI) having a subsecond cadence and a pixel area of about
at the Sun during perihelion. In order to guarantee their scientific
success, it is necessary to evaluate if the photon counts available at these
resolution and cadence will provide a sufficient signal-to-noise ratio (SNR).
We perform a first step in this direction by analyzing and characterizing the
spatial intermittency of Quiet Sun images thanks to a multifractal analysis.
We identify the parameters that specify the scale-invariance behavior. This
identification allows next to select a family of multifractal processes, namely
the Compound Poisson Cascades, that can synthesize artificial images having
some of the scale-invariance properties observed on the recorded images.
The prevalence of self-similarity in Quiet Sun coronal images makes it
relevant to study the ratio between the SNR present at SoHO/EIT images and in
coarsened images. SoHO/EIT images thus play the role of 'high resolution'
images, whereas the 'low-resolution' coarsened images are rebinned so as to
simulate a smaller angular resolution and/or a larger distance to the Sun. For
a fixed difference in angular resolution and in Spacecraft-Sun distance, we
determine the proportion of pixels having a SNR preserved at high resolution
given a particular increase in effective area. If scale-invariance continues to
prevail at smaller scales, the conclusion reached with SoHO/EIT images can be
transposed to the situation where the resolution is increased from SoHO/EIT to
SO/HRI resolution at perihelion.Comment: 25 pages, 1 table, 7 figure
- …