20,181 research outputs found

    Using observation-level random effects to model overdispersion in count data in ecology and evolution

    Get PDF
    Overdispersion is common in models of count data in ecology and evolutionary biology, and can occur due to missing covariates, non-independent (aggregated) data, or an excess frequency of zeroes (zero-inflation). Accounting for overdispersion in such models is vital, as failing to do so can lead to biased parameter estimates, and false conclusions regarding hypotheses of interest. Observation-level random effects (OLRE), where each data point receives a unique level of a random effect that models the extra-Poisson variation present in the data, are commonly employed to cope with overdispersion in count data. However studies investigating the efficacy of observation-level random effects as a means to deal with overdispersion are scarce. Here I use simulations to show that in cases where overdispersion is caused by random extra-Poisson noise, or aggregation in the count data, observation-level random effects yield more accurate parameter estimates compared to when overdispersion is simply ignored. Conversely, OLRE fail to reduce bias in zero-inflated data, and in some cases increase bias at high levels of overdispersion. There was a positive relationship between the magnitude of overdispersion and the degree of bias in parameter estimates. Critically, the simulations reveal that failing to account for overdispersion in mixed models can erroneously inflate measures of explained variance (r2), which may lead to researchers overestimating the predictive power of variables of interest. This work suggests use of observation-level random effects provides a simple and robust means to account for overdispersion in count data, but also that their ability to minimise bias is not uniform across all types of overdispersion and must be applied judiciously

    Modelling count data with overdispersion and spatial effects

    Get PDF
    In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. (2002) and using proper scoring rules, see for example Gneiting and Raftery (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, a spatial Poisson model is to be preferred over all other models according to the considered criteria

    Adjusting for overdispersion in piecewise exponential regression models to estimate excess mortality rate in population-based research.

    Get PDF
    BACKGROUND: In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. METHODS: We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. RESULTS: All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. CONCLUSION: We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion

    Conditional Heteroskedasticity in Count Data Regression: Self-Feeding Activity in Fish

    Get PDF
    The paper introduces a new approach to incorporating time dependent overdispersion for Poisson related regression models. To handle the added flexibility in conditional heteroskedasticity in time series count data some wellknown estimators are adapted and a GMM type estimator is suggested. The estimators are applied to a time series of self-feeding activity in Arctic charr. There is strong support for both a dynamic conditional mean function and a dynamic model for the overdispersion.Poisson; Overdispersion; ARCH; Estimation; Self-Feeding; Arctic Charr

    Recreation Demand Analysis under Truncation, Overdispersion, and Endogenous Stratification: An Application to Gros Morne National Park

    Get PDF
    Using on-site survey data from Gros Morne National Park in Newfoundland, this paper estimates and compares several truncated count data models of recreation demand. The model that not only accounts for the truncated and overdispersed nature of the data but also for endogenous stratification duet o the oversampling of avid users, while allowing for flexible specification of the overdispersion parameter dominates on the basis of goodness of fit. The results are used to estimate the users’ value of access to the park.on-site sampling, endogenous stratification, consumer surplus, count data, overdispersion, recreation demand, travel cost method, truncation.

    Molecular Clock on a Neutral Network

    Full text link
    The number of fixed mutations accumulated in an evolving population often displays a variance that is significantly larger than the mean (the overdispersed molecular clock). By examining a generic evolutionary process on a neutral network of high-fitness genotypes, we establish a formalism for computing all cumulants of the full probability distribution of accumulated mutations in terms of graph properties of the neutral network, and use the formalism to prove overdispersion of the molecular clock. We further show that significant overdispersion arises naturally in evolution when the neutral network is highly sparse, exhibits large global fluctuations in neutrality, and small local fluctuations in neutrality. The results are also relevant for elucidating the topological structure of a neutral network from empirical measurements of the substitution process.Comment: 10 page

    A stochastic model for multivariate surveillance of infectious diseases

    Get PDF
    We describe a stochastic model based on a branching process for analyzing surveillance data of infectious diseases that allows to make forecasts of the future development of the epidemic. The model is based on a Poisson branching process with immigration with additional adjustment for possible overdispersion. An extension to a space-time model for the multivariate case is described. The model is estimated in a Bayesian context using Markov Chain Monte Carlo (MCMC) techniques. We illustrate the applicability of the model through analyses of simulated and real data
    corecore