118,962 research outputs found

    A flexible regression model for count data

    Full text link
    Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over- or under-dispersed and, thus, not conducive to Poisson regression. We propose a regression model based on the Conway--Maxwell-Poisson (COM-Poisson) distribution to address this problem. The COM-Poisson regression generalizes the well-known Poisson and logistic regression models, and is suitable for fitting count data with a wide range of dispersion levels. With a GLM approach that takes advantage of exponential family properties, we discuss model estimation, inference, diagnostics, and interpretation, and present a test for determining the need for a COM-Poisson regression over a standard Poisson regression. We compare the COM-Poisson to several alternatives and illustrate its advantages and usefulness using three data sets with varying dispersion.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS306 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Spatial modelling of claim frequency and claim size in insurance

    Get PDF
    In this paper models for claim frequency and claim size in non-life insurance are considered. Both covariates and spatial random e ects are included allowing the modelling of a spatial dependency pattern. We assume a Poisson model for the number of claims, while claim size is modelled using a Gamma distribution. However, in contrast to the usual compound Poisson model going back to Lundberg (1903), we allow for dependencies between claim size and claim frequency. Both models for the individual and average claim sizes of a policyholder are considered. A fully Bayesian approach is followed, parameters are estimated using Markov Chain Monte Carlo (MCMC). The issue of model comparison is thoroughly addressed. Besides the deviance information criterion suggested by Spiegelhalter et al. (2002), the predictive model choice criterion (Gelfand and Ghosh (1998)) and proper scoring rules (Gneiting and Raftery (2005)) based on the posterior predictive distribution are investigated. We give an application to a comprehensive data set from a German car insurance company. The inclusion of spatial e ects significantly improves the models for both claim frequency and claim size and also leads to more accurate predictions of the total claim sizes. Further we quantify the significant number of claims e ects on claim size

    Modelling count data with overdispersion and spatial effects

    Get PDF
    In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. (2002) and using proper scoring rules, see for example Gneiting and Raftery (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, a spatial Poisson model is to be preferred over all other models according to the considered criteria

    Histogram comparison as a powerful tool for the search of new physics at LHC. Application to CMSSM

    Full text link
    We propose a rigorous and effective way to compare experimental and theoretical histograms, incorporating the different sources of statistical and systematic uncertainties. This is a useful tool to extract as much information as possible from the comparison between experimental data with theoretical simulations, optimizing the chances of identifying New Physics at the LHC. We illustrate this by showing how a search in the CMSSM parameter space, using Bayesian techniques, can effectively find the correct values of the CMSSM parameters by comparing histograms of events with multijets + missing transverse momentum displayed in the effective-mass variable. The procedure is in fact very efficient to identify the true supersymmetric model, in the case supersymmetry is really there and accessible to the LHC

    Bayesian nonparametric models for spatially indexed data of mixed type

    Get PDF
    We develop Bayesian nonparametric models for spatially indexed data of mixed type. Our work is motivated by challenges that occur in environmental epidemiology, where the usual presence of several confounding variables that exhibit complex interactions and high correlations makes it difficult to estimate and understand the effects of risk factors on health outcomes of interest. The modeling approach we adopt assumes that responses and confounding variables are manifestations of continuous latent variables, and uses multivariate Gaussians to jointly model these. Responses and confounding variables are not treated equally as relevant parameters of the distributions of the responses only are modeled in terms of explanatory variables or risk factors. Spatial dependence is introduced by allowing the weights of the nonparametric process priors to be location specific, obtained as probit transformations of Gaussian Markov random fields. Confounding variables and spatial configuration have a similar role in the model, in that they only influence, along with the responses, the allocation probabilities of the areas into the mixture components, thereby allowing for flexible adjustment of the effects of observed confounders, while allowing for the possibility of residual spatial structure, possibly occurring due to unmeasured or undiscovered spatially varying factors. Aspects of the model are illustrated in simulation studies and an application to a real data set

    Image Coaddition with Temporally Varying Kernels

    Full text link
    Large, multi-frequency imaging surveys, such as the Large Synaptic Survey Telescope (LSST), need to do near-real time analysis of very large datasets. This raises a host of statistical and computational problems where standard methods do not work. In this paper, we study a proposed method for combining stacks of images into a single summary image, sometimes referred to as a template. This task is commonly referred to as image coaddition. In part, we focus on a method proposed in previous work, which outlines a procedure for combining stacks of images in an online fashion in the Fourier domain. We evaluate this method by comparing it to two straightforward methods through the use of various criteria and simulations. Note that the goal is not to propose these comparison methods for use in their own right, but to ensure that additional complexity also provides substantially improved performance

    Chaotic scattering with direct processes: A generalization of Poisson's kernel for non-unitary scattering matrices

    Full text link
    The problem of chaotic scattering in presence of direct processes or prompt responses is mapped via a transformation to the case of scattering in absence of such processes for non-unitary scattering matrices, \tilde S. In the absence of prompt responses, \tilde S is uniformly distributed according to its invariant measure in the space of \tilde S matrices with zero average, < \tilde S > =0. In the presence of direct processes, the distribution of \tilde S is non-uniform and it is characterized by the average (\neq 0). In contrast to the case of unitary matrices S, where the invariant measures of S for chaotic scattering with and without direct processes are related through the well known Poisson kernel, here we show that for non-unitary scattering matrices the invariant measures are related by the Poisson kernel squared. Our results are relevant to situations where flux conservation is not satisfied. For example, transport experiments in chaotic systems, where gains or losses are present, like microwave chaotic cavities or graphs, and acoustic or elastic resonators.Comment: Added two appendices and references. Corrected typo

    Quantifying and containing the curse of high resolution coronal imaging

    Get PDF
    Future missions such as Solar Orbiter (SO), InterHelioprobe, or Solar Probe aim at approaching the Sun closer than ever before, with on board some high resolution imagers (HRI) having a subsecond cadence and a pixel area of about (80km)2(80km)^2 at the Sun during perihelion. In order to guarantee their scientific success, it is necessary to evaluate if the photon counts available at these resolution and cadence will provide a sufficient signal-to-noise ratio (SNR). We perform a first step in this direction by analyzing and characterizing the spatial intermittency of Quiet Sun images thanks to a multifractal analysis. We identify the parameters that specify the scale-invariance behavior. This identification allows next to select a family of multifractal processes, namely the Compound Poisson Cascades, that can synthesize artificial images having some of the scale-invariance properties observed on the recorded images. The prevalence of self-similarity in Quiet Sun coronal images makes it relevant to study the ratio between the SNR present at SoHO/EIT images and in coarsened images. SoHO/EIT images thus play the role of 'high resolution' images, whereas the 'low-resolution' coarsened images are rebinned so as to simulate a smaller angular resolution and/or a larger distance to the Sun. For a fixed difference in angular resolution and in Spacecraft-Sun distance, we determine the proportion of pixels having a SNR preserved at high resolution given a particular increase in effective area. If scale-invariance continues to prevail at smaller scales, the conclusion reached with SoHO/EIT images can be transposed to the situation where the resolution is increased from SoHO/EIT to SO/HRI resolution at perihelion.Comment: 25 pages, 1 table, 7 figure
    corecore