124,465 research outputs found

    A flexible regression model for count data

    Full text link
    Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over- or under-dispersed and, thus, not conducive to Poisson regression. We propose a regression model based on the Conway--Maxwell-Poisson (COM-Poisson) distribution to address this problem. The COM-Poisson regression generalizes the well-known Poisson and logistic regression models, and is suitable for fitting count data with a wide range of dispersion levels. With a GLM approach that takes advantage of exponential family properties, we discuss model estimation, inference, diagnostics, and interpretation, and present a test for determining the need for a COM-Poisson regression over a standard Poisson regression. We compare the COM-Poisson to several alternatives and illustrate its advantages and usefulness using three data sets with varying dispersion.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS306 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Poisson Ridge Regression Estimator

    Get PDF
    The standard statistical method for analyzing count data is the Poisson regression model, which is usually estimated using maximum likelihood (ML). The ML method is very sensitive to multicollinearity. Therefore, we present a new Poisson ridge regression estimator (PRR) as a remedy to the problem of instability of the traditional ML method. To investigate the performance of the PRR and the traditional ML approaches for estimating the parameters of the Poisson regression model, we calculate the mean squared error (MSE) using Monte Carlo simulations. The result from the simulation study shows that the PRR method outperforms the traditional ML estimator in all of the different situations evaluated in this paper.Poisson regression; maximum likelihood; ridge regression; MSE; Monte Carlo simulations; Multicollinearity

    Multiple Approaches to Absenteeism Analysis

    Get PDF
    Absenteeism research has often been criticized for using inappropriate analysis. Characteristics of absence data, notably that it is usually truncated and skewed, violate assumptions of OLS regression; however, OLS and correlation analysis remain the dominant models of absenteeism research. This piece compares eight models that may be appropriate for analyzing absence data. Specifically, this piece discusses and uses OLS regression, OLS regression with a transformed dependent variable, the Tobit model, Poisson regression, Overdispersed Poisson regression, the Negative Binomial model, Ordinal Logistic regression, and the Ordinal Probit model. A simulation methodology is employed to determine the extent to which each model is likely to produce false positives. Simulations vary with respect to the shape of the dependent variable\u27s distribution, sample size, and the shape of the independent variables\u27 distributions. Actual data,based on a sample of 195 manufacturing employees, is used to illustrate how these models might be used to analyze a real data set. Results from the simulation suggest that, despite methodological expectations, OLS regression does not produce significantly more false positives than expected at various alpha levels. However, the Tobit and Poisson models are often shown to yield too many false positives. A number of other models yield less than the expected number of false positives, thus suggesting that they may serve well as conservative hypothesis tests

    Mixture of bivariate Poisson regression models with an application to insurance

    Get PDF
    In a recent paper Bermúdez [2009] used bivariate Poisson regression models for ratemaking in car insurance, and included zero-inflated models to account for the excess of zeros and the overdispersion in the data set. In the present paper, we revisit this model in order to consider alternatives. We propose a 2-finite mixture of bivariate Poisson regression models to demonstrate that the overdispersion in the data requires more structure if it is to be taken into account, and that a simple zero-inflated bivariate Poisson model does not suffice. At the same time, we show that a finite mixture of bivariate Poisson regression models embraces zero-inflated bivariate Poisson regression models as a special case. Additionally, we describe a model in which the mixing proportions are dependent on covariates when modelling the way in which each individual belongs to a separate cluster. Finally, an EM algorithm is provided in order to ensure the models’ ease-of-fit. These models are applied to the same automobile insurance claims data set as used in Bermúdez [2009] and it is shown that the modelling of the data set can be improved considerably.Zero-inflation, Overdispersion, EM algorithm, Automobile insurance, A priori ratemaking.

    Mean-parametrized Conway-Maxwell-Poisson regression models for dispersed counts

    Full text link
    Conway-Maxwell-Poisson (CMP) distributions are flexible generalizations of the Poisson distribution for modelling overdispersed or underdispersed counts. The main hindrance to their wider use in practice seems to be the inability to directly model the mean of counts, making them not compatible with nor comparable to competing count regression models, such as the log-linear Poisson, negative-binomial or generalized Poisson regression models. This note illustrates how CMP distributions can be parametrized via the mean, so that simpler and more easily-interpretable mean-models can be used, such as a log-linear model. Other link functions are also available, of course. In addition to establishing attractive theoretical and asymptotic properties of the proposed model, its good finite-sample performance is exhibited through various examples and a simulation study based on real datasets. Moreover, the MATLAB routine to fit the model to data is demonstrated to be up to an order of magnitude faster than the current software to fit standard CMP models, and over two orders of magnitude faster than the recently proposed hyper-Poisson model.Comment: To appear in Statistical Modelling: An International Journa
    corecore