405 research outputs found

    A combined beta and normal random-effects model for repeated, overdispersed binary and binomial data

    Get PDF
    AbstractNon-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, may result from repeatedly measuring the outcome, for various members of the same family, etc. The first issue is dealt with through a variety of overdispersion models, such as, for example, the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena may occur simultaneously, models combining them are uncommon. This paper starts from the broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary and binomial cases are our focus. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical integration. The methodology is applied to two datasets of which the outcomes are binary and binomial, respectively

    Variable Selection and Model Averaging in Semiparametric Overdispersed Generalized Linear Models

    Full text link
    We express the mean and variance terms in a double exponential regression model as additive functions of the predictors and use Bayesian variable selection to determine which predictors enter the model, and whether they enter linearly or flexibly. When the variance term is null we obtain a generalized additive model, which becomes a generalized linear model if the predictors enter the mean linearly. The model is estimated using Markov chain Monte Carlo simulation and the methodology is illustrated using real and simulated data sets.Comment: 8 graphs 35 page

    Hierarchical models with normal and conjugate random effects : a review

    Get PDF
    Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs et al. (2010) proposed a general framework to model hierarchical data subject to within-unit correlation and/or overdispersion. The framework extends classical overdispersion models as well as generalized linear mixed models. Subsequent work has examined various aspects that lead to the formulation of several extensions. A unified treatment of the model framework and key extensions is provided. Particular extensions discussed are: explicit calculation of correlation and other moment-based functions, joint modelling of several hierarchical sequences, versions with direct marginally interpretable parameters, zero-inflation in the count case, and influence diagnostics. The basic models and several extensions are illustrated using a set of key examples, one per data type (count, binary, multinomial, ordinal, and time-to-event)

    Statistical Methods for Modeling Count Data with Overdispersion and Missing Time Varying Categorical Covariates

    Get PDF
    In studying the association between count outcomes and covariates using Poisson regression, the necessary requirement that the mean and variance of responses are equivalent for each covariate pattern is not always met in real datasets. This violation of equidispersion can lead to invalid inference unless proper alternative models are considered. There is currently no comprehensive and definitive assessment of the different methods of dealing with overdispersion, nor is there a standard approach for determining the threshold of overdispersion such that statistical intervention is necessary. The issue of overdispersion can be further complicated by the presence of missing covariate data in count outcome models. In this dissertation we have (1) compared the performance of different statistical models for dealing with overdispersion, (2) determined an appropriate threshold of the ratio of the Pearson chi-squared goodness of fit statistic to degrees of freedom σp such that statistical intervention is necessary to address the overdispersion, (3) developed a latent transition multiple imputation (LTMI) approach for dealing with missing time varying categorical covariates in count outcome models, and (4) compared the performance of LTMI with complete case analysis (CCA) and latent class multiple imputation (LCMI) in addressing missing time varying categorical covariates in the presence of overdispersion. Latent class assignment was determined via both SAS software and random effect modeling, and missing observation imputation was performed using predictive mean matching multiple imputation methods. We utilized extensive simulation studies to assess the performance of the proposed methods on a variety of overdispersion and missingness scenarios. We further demonstrated the application of the proposed models and methods via real data examples. We conclude that the negative binomial generalized linear mixed model (NB-GLMM) is superior overall for modeling count data characterized by overdispersion. Furthermore, a general threshold for relying on the simple Poisson model for cross-sectional and longitudinal datasets is in cases where σp \u3c=1.2. LTMI methods outperform CCA and LCMI in many scenarios, particularly when there is a higher percentage of missingness and data are MAR. Lastly, NBGLMM is preferable to address overdispersion while LTMI is preferable for imputing covariate observations when jointly considering both issues

    Longitudinal Beta-Binomial Modeling using GEE for Over-Dispersed Binomial Data

    Get PDF
    Longitudinal binomial data are frequently generated from multiple questionnaires and assessments in various scientific settings for which the binomial data are often overdispersed. The standard generalized linear mixed effects model may result in severe underestimation of standard errors of estimated regression parameters in such cases and hence potentially bias the statistical inference. In this paper, we propose a longitudinal beta-binomial model for overdispersed binomial data and estimate the regression parameters under a probit model using the generalized estimating equation method. A hybrid algorithm of the Fisher scoring and the method of moments is implemented for computing the method. Extensive simulation studies are conducted to justify the validity of the proposed method. Finally, the proposed method is applied to analyze functional impairment in subjects who are at risk of Huntington disease from a multisite observational study of prodromal Huntington disease

    A Marginalized Model for Zero-Inflated, Overdispersed, and Correlated Count Data

    Get PDF
    Iddi and Molenberghs (2012) merged the attractive features of the so-called combined model of Molenberghs {\em et al\/} (2010) and the marginalized model of Heagerty (1999) for hierarchical non-Gaussian data with overdispersion. In this model, the fixed-effect parameters retain their marginal interpretation. Lee et al (2011) also developed an extension of Heagerty (1999) to handle zero-inflation from count data, using the hurdle model. To bring together all of these features, a marginalized, zero-inflated, overdispersed model for correlated count data is proposed. Using two empirical sets of data, it is shown that the proposed model leads to important improvements in model fit

    Generating Correlated and/or Overdispersed Count Data: A SAS Implementation

    Get PDF
    Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs, Verbeke, Demétrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output

    Detection and down-weighting of outliers in non-normal data: theory and application

    Get PDF

    The mixed model for the analysis of a repeated‐measurement multivariate count data

    Get PDF
    Clustered overdispersed multivariate count data are challenging to model due to the presence of correlation within and between samples. Typically, the first source of correlation needs to be addressed but its quantification is of less interest. Here, we focus on the correlation between time points. In addition, the effects of covariates on the multivariate counts distribution need to be assessed. To fulfill these requirements, a regression model based on the Dirichlet‐multinomial distribution for association between covariates and the categorical counts is extended by using random effects to deal with the additional clustering. This model is the Dirichlet‐multinomial mixed regression model. Alternatively, a negative binomial regression mixed model can be deployed where the corresponding likelihood is conditioned on the total count. It appears that these two approaches are equivalent when the total count is fixed and independent of the random effects. We consider both subject‐specific and categorical‐specific random effects. However, the latter has a larger computational burden when the number of categories increases. Our work is motivated by microbiome data sets obtained by sequencing of the amplicon of the bacterial 16S rRNA gene. These data have a compositional structure and are typically overdispersed. The microbiome data set is from an epidemiological study carried out in a helminth‐endemic area in Indonesia. The conclusions are as follows: time has no statistically significant effect on microbiome composition, the correlation between subjects is statistically significant, and treatment has a significant effect on the microbiome composition only in infected subjects who remained infected
    corecore