1,143 research outputs found

    Testing for zero-modification in count regression models

    Get PDF
    Count data often exhibit overdispersion and/or require an adjustment for zero outcomes with respect to a Poisson model. Zero-modified Poisson (ZMP) and zero-modified generalized Poisson (ZMGP) regression models are useful classes of models for such data. In the literature so far only score tests are used for testing the necessity of this adjustment. For this testing problem we show how poor the performance of the corresponding score test can be in comparison to the performance of Wald and likelihood ratio (LR) tests through a simulation study. In particular, the score test in the ZMP case results in a power loss of 47% compared to the Wald test in the worst case, while in the ZMGP case the worst loss is 87%. Therefore, regardless of the computational advantage of score tests, the loss in power compared to the Wald and LR tests should not be neglected and these much more powerful alternatives should be used instead. We also prove consistency and asymptotic normality of the maximum likelihood estimators in the above mentioned regression models to give a theoretical justification for Wald and likelihood ratio tests

    Consistency and asymptotic normality of the maximum likelihood estimator in a zero-inflated generalized Poisson regression

    Get PDF
    Poisson regression models for count variables have been utilized in many applications. However, in many problems overdispersion and zero-inflation occur. We study in this paper regression models based on the generalized Poisson distribution (Consul (1989)). These regression models which have been used for about 15 years do not belong to the class of generalized linear models considered by (McCullagh and Nelder (1989)) for which an established asymptotic theory is available. Therefore we prove consistency and asymptotic normality of a solution to the maximum likelihood equations for zero-inflated generalized Poisson regression models. Further the accuracy of the asymptotic normality approximation is investigated through a simulation study. This allows to construct asymptotic confidence intervals and likelihood ratio tests

    Zero-inflated regression models for radiation-induced chromosome aberration data: A comparative study

    Get PDF
    Within the field of cytogenetic biodosimetry, Poisson regression is the classical approach for modeling the number of chromosome aberrations as a function of radiation dose. However, it is common to find data that exhibit overdispersion. In practice, the assumption of equidispersion may be violated due to unobserved heterogeneity in the cell population, which will render the variance of observed aberration counts larger than their mean, and/or the frequency of zero counts greater than expected for the Poisson distribution. This phenomenon is observable for both full- and partial-body exposure, but more pronounced for the latter. In this work, different methodologies for analyzing cytogenetic chromosomal aberrations datasets are compared, with special focus on zero-inflated Poisson and zero-inflated negative binomial models. A score test for testing for zero inflation in Poisson regression models under the identity link is also developed

    Multiple Approaches to Absenteeism Analysis

    Get PDF
    Absenteeism research has often been criticized for using inappropriate analysis. Characteristics of absence data, notably that it is usually truncated and skewed, violate assumptions of OLS regression; however, OLS and correlation analysis remain the dominant models of absenteeism research. This piece compares eight models that may be appropriate for analyzing absence data. Specifically, this piece discusses and uses OLS regression, OLS regression with a transformed dependent variable, the Tobit model, Poisson regression, Overdispersed Poisson regression, the Negative Binomial model, Ordinal Logistic regression, and the Ordinal Probit model. A simulation methodology is employed to determine the extent to which each model is likely to produce false positives. Simulations vary with respect to the shape of the dependent variable\u27s distribution, sample size, and the shape of the independent variables\u27 distributions. Actual data,based on a sample of 195 manufacturing employees, is used to illustrate how these models might be used to analyze a real data set. Results from the simulation suggest that, despite methodological expectations, OLS regression does not produce significantly more false positives than expected at various alpha levels. However, the Tobit and Poisson models are often shown to yield too many false positives. A number of other models yield less than the expected number of false positives, thus suggesting that they may serve well as conservative hypothesis tests

    Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: An example from a vertigo phase III study with longitudinal count data as primary endpoint

    Get PDF
    Background: A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods: We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over-and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform(PIT), and by using proper scoring rules (e.g. the logarithmic score). Results: The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions: The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint

    Statistical Methods for Modeling Count Data with Overdispersion and Missing Time Varying Categorical Covariates

    Get PDF
    In studying the association between count outcomes and covariates using Poisson regression, the necessary requirement that the mean and variance of responses are equivalent for each covariate pattern is not always met in real datasets. This violation of equidispersion can lead to invalid inference unless proper alternative models are considered. There is currently no comprehensive and definitive assessment of the different methods of dealing with overdispersion, nor is there a standard approach for determining the threshold of overdispersion such that statistical intervention is necessary. The issue of overdispersion can be further complicated by the presence of missing covariate data in count outcome models. In this dissertation we have (1) compared the performance of different statistical models for dealing with overdispersion, (2) determined an appropriate threshold of the ratio of the Pearson chi-squared goodness of fit statistic to degrees of freedom σp such that statistical intervention is necessary to address the overdispersion, (3) developed a latent transition multiple imputation (LTMI) approach for dealing with missing time varying categorical covariates in count outcome models, and (4) compared the performance of LTMI with complete case analysis (CCA) and latent class multiple imputation (LCMI) in addressing missing time varying categorical covariates in the presence of overdispersion. Latent class assignment was determined via both SAS software and random effect modeling, and missing observation imputation was performed using predictive mean matching multiple imputation methods. We utilized extensive simulation studies to assess the performance of the proposed methods on a variety of overdispersion and missingness scenarios. We further demonstrated the application of the proposed models and methods via real data examples. We conclude that the negative binomial generalized linear mixed model (NB-GLMM) is superior overall for modeling count data characterized by overdispersion. Furthermore, a general threshold for relying on the simple Poisson model for cross-sectional and longitudinal datasets is in cases where σp \u3c=1.2. LTMI methods outperform CCA and LCMI in many scenarios, particularly when there is a higher percentage of missingness and data are MAR. Lastly, NBGLMM is preferable to address overdispersion while LTMI is preferable for imputing covariate observations when jointly considering both issues

    Modeling zero-inflated and overdispersed count data: application to in-hospital mortality data

    Get PDF
    Hyperchloremia (high serum chloride level) is frequently observed in critically ill patients in the intensive care unit (ICU). Clinical evidence shows that hyperchloremia is associated with increased in-hospital mortality. Length of hospital stay (LOS) is often used as an indicator of hospital efficiency, a proxy of resource consumption and is especially important in organizing hospital services. Such data often have a highly right-skewed distribution for non-zero values and possible excess zero counts. Our study aims to examine the association of serum chloride levels at different time points with hospital mortality and to model the length of hospital and ICU stays in conjunction with zero-inflated and overdispersed count data. This research will consider the use of several univariate and multivariate models to evaluate the effects of serum chloride as it pertains to patient mortality. This research resulted from application to more than 1700 critically ill patients from a local hospital

    A STUDY OF SMALL AREA ESTIMATION TO MEASURE MULTIDIMENSIONAL POVERTY WITH MIXED MODEL POISSON, ZIP, AND ZINB

    Get PDF
    The research began with calculating the value of multidimensional poverty at the district level in West Java Province from SUSENAS 2021. The calculation of multidimensional poverty was based on individuals in each district or city household. The dimensional weights are weighed the same, and the indicators in the dimensions are also weighed the same. Furthermore, the simulation study used the Poisson, ZIP, and ZINB mixed models to examine the model's performance on data with cases of excess zero values and overdispersion. The simulation was by generating data without overdispersion and with overdispersion. Overdispersion data was generated with parameters of ω (0.1, 0.3, 0.5, and 0.7), and the model was evaluated from the AIC value. The best method in the simulation study was used to estimate multidimensional poverty in sub-districts in West Java Province using PODES 2021. Simulation studies on data without overdispersion showed no difference in the model's goodness. Overdispersion data shows Mixed Model ZIP and ZINB are better than Mixed Model Poisson. The percentage of the multidimensional poverty population at the sub-district level in West Java Province is quite diverse, from 0.04% to 75.54%

    Estimating dose and exposure fraction from radiation biomarkers in the presence of overdispersion

    Get PDF
    It is typically assumed that the total γ\gamma-H2AX foci produced in a sample of blood cells is Poisson distributed, whose expected yield can be represented by a linear function of the absorbed dose. However, in practice, because of unobserved heterogeneity in the cell population, the standard Poisson assumption of equidispersion will most likely be contravened which will cause the variance of the foci counts to be larger than their mean. In both whole and partial body exposure this phenomenon is perceptible, unlike in the context of the dicentric assay in which overdispersion is usually considered only to be linked to partial exposure. For such situations, and as we will demonstrate, it is suitable to utilise a model that can handle overdispersion such as the quasi-Poisson or negative binomial regression. The scenarios of most radiation accidents result in partial-body exposures or non-uniform dose distribution, leading to a differential exposure of lymphocytes in the body. Subsequently for the exposed individuals, their blood will contain a mixture of cells showing no radiation impact at all and cells featuring a distribution of counts according to dose of exposure. For such exposure scenarios, it remains that there are no statistical procedures to follow for the γ\gamma-H2AX assay. Part of this work will focus on updating the contaminated Poisson method, traditionally used in conjunction with cytogenetic biomarkers, to enable an estimate of the radiation dose and irradiated fraction to be found in the presence of both zero-inflation and overdispersion. As an extension, we discuss and compare how to measure the uncertainty associated with a given dose estimate via the delta, Merkle and ISO methods. We illustrate their applications firstly via simulated zero-inflated Poisson and NB1 data, with the non-inflated part being generated using an external γ\gamma-H2AX whole-body calibration curve, before applying the methodology to practical data
    corecore