5,158 research outputs found
FIRST ORDER AUTOREGRESSIVE MIXED EFFECTS ZERO INFLATED POISSON MODEL FOR LONGITUDINAL DATA - A BAYESIAN APPROACH
The First Order Autoregressive (AR(1)) Mixed Effects Zero Inflated Poisson (ZIP) Model was developed to analyze longitudinal zero inflated Poisson data through the Bayesian Approach. The model describes the effect of covariates via regression and time varying correlations within subject. Subjects are classified into a "perfect" state with response equal to zero and a Poisson state with response following a Poisson regression model. The probability of belonging to the perfect state or Poisson state is governed by a logistic regression model. Both models include autocorrelated random effects, and there is correlation between random effects in the logistic and Poisson regressions.
Parameter estimation is investigated using simulation studies and analyses (both frequentist and Bayesian) of simpler mixed effect models. In the large sample setting we investigate the Fisher information of the model. The Fisher information matrix is then used to determine an adequate sample size for the AR(1) ZIP model. Simulation studies demonstrate the capability of Bayesian methods to estimate the parameters of the AR(1) ZIP model for longitudinal zero inflated Poisson data. However, a tremendous computation time and a huge sample size are required by the full AR(1) ZIP model.
Simpler models were fitted to simulated AR(1) ZIP data to investigate whether simplifying the assumed random structure could permit accurate estimates of fixed effect parameters. However, simulations showed that the bias of two nested models, ZIP model and mixed effects ZIP model, are too large to be acceptable. The AR(1) ZIP model was fitted to data on numbers of cigarettes smoked, collected in the National Longitudinal Study of Youth. It was found that decisions on whether to smoke and on the number of cigarettes to smoke were significantly related to age, sex, race and smoking behavior by peers. The random effect variances, autocorrelation coefficients and correlation between logistic and Poisson random effect were all significant
High-dimensional Bias-corrected Inference, with Applications to fMRI Studies
In neuroimaging studies, measures of neural structure and function are used to try to predict clinical outcomes in adults. Identifying biomarkers that reflect underlying neuropathological processes can provide promising neural targets for future therapeutic interventions. This identification is typically done using linear or generalized linear models (GLM) with many covariates and relatively few subjects. Thus, regularization is used to select the salient covariates in the model. In this thesis, we compare the performance of the least absolute shrinkage and selection operator (LASSO) regression, adaptive LASSO regression, debiased LASSO regression, and regularized zero-inflated Poisson (ZIP) regression model in two simulation settings. The performance of LASSO regression with Poisson and Gaussian models are similar but for all these approaches the zero-inflated model outperforms the rest. We apply these approaches to the data from the Longitudinal Assessment of Manic Symptoms (LAMS) study. We then study the bias correction of GLM and the application on ZIP data. We apply a decorrelated score approach to address Poisson distributed data and introduce Cornish-Fisher correction to the decorrelated score test. In high-dimension settings, the Cornish-Fisher correction can improve the performance of decorrelated score test for ZIP data
Recommended from our members
A familial longitudinal count data study
textIn this report, I study familial longitudinal count data with a Poisson regression model. The data is collected from individuals who are nested in families. I focus on two main issues to fit a model. The first one is the large number of excess zeros and the second one is multi-level random effects. My approach for solving these problems are to use either Zero Inflated Poisson (ZIP) or Negative Binomial (NB) models to control for the excess zeros which allow for estimation of another parameter for over dispersion while developing the model with individual and familial random effects. First, I use a Poisson regression model with only main effects. After that, I fit a ZIP model to control for the extra zeros. I provide information about general form of the exponential families and a discussion about the dispersion parameter. I also fit a Negative Binomial model instead of the ZIP model. I also build these models with only individual random effects and with both individual and familial random effects as well. I discuss the generalized estimating equation (GEE) approach to estimate the parameters of a generalized linear model with auto regressive correlation between outcomes.Statistic
Analysis of Longitudinal Data and Model Selection
An important issue in regression analysis of longitudinal data is model parsimony, that is, finding a model with as few regression variables as possible while retaining good properties of the parameter estimates. In this vein, joint modelling of mean and variance taking into account the intra subject correlation has been standard in recent literature (Pourahmadi, 1999, 2000; Ye and Pan, 2006; and Leng, Zhang, and Pan, 2010). Zhang, Leng, and Tang (2015) propose joint parametric modelling of the means, variances and correlations by decomposing the correlation matrix via hyperspherical co-ordinates and show that this results in unconstrained parameterization, fast computation, easy interpretation of the parameters, and model parsimony. We investigate the properties of the estimates of the regression parameters through semiparametric modelling of the means and variances and study the impact of this to model parsimony. An extensive simulation study is conducted. Three datasets, namely, a biomedical dataset, an environmental dataset and a cattle dataset are analysed. In longitudinal studies, researchers frequently encounter covariates that are varying over time (see for example Huang, Wu, and Zhou, 2002). We consider a generalized partially linear varying coefficient model for such data and propose a regression spline based approach to estimate the mean and covariance parameters jointly where the correlation matrix is decomposed via hyperspherical co-ordinates. A simulation study is conducted to investigate the properties of the estimates of the regression parameters in terms of bias and standard error and to analyse a real data set taken from a multi-center AIDS cohort study. The problem of model selection in regression analysis through the use of forward selection, backward elimination and stepwise selection has been well developed in the literature. The main assumption in this, of course, is that the data are normally distributed and the main tool used here is either a t test or an F test. However, properties of these model selection procedures in the framework of generalized linear models are not well-known. We study here the properties of these procedures in generalized linear models, of which the normal linear regression model is a special case. The main tools that is being used are the score test, the F-test, other large sample tests, such as, the likelihood ratio test and the Wald test; the AIC and the BIC are included in the comparison. A systematic study, through simulations, of the properties of this procedure is conducted, in terms of level and power, for normal, Poisson and binomial regression models. Extensions for over-dispersed Poisson and over-dispersed binomial regression models are also given and evaluated. The methods are applied to analyse three data sets. In practice, it often occurs that an abundance of zero counts arise in data where a discrete generalized linear model may fail to fit but a zero-inflated generalized linear model can be the ideal choice. Researchers often encounter a large number of covariates in such model and need to decide which are potentially important. To find a parsimonious model we develop a model selection procedure using the score test, the Wald test and the likelihood ratio test; also the AIC and the BIC are included in the comparison. Simulation studies are carried out to investigate the performance of these procedures, in terms of level and power, for zero-inflated Poisson and zero-inflated binomial regression models. The methodology is illustrated through two real examples
Statistical model for overdispersed count outcome with many zeros: an approach for direct marginal inference
Marginalized models are in great demand by most researchers in the life
sciences particularly in clinical trials, epidemiology, health-economics,
surveys and many others since they allow generalization of inference to the
entire population under study. For count data, standard procedures such as the
Poisson regression and negative binomial model provide population average
inference for model parameters. However, occurrence of excess zero counts and
lack of independence in empirical data have necessitated their extension to
accommodate these phenomena. These extensions, though useful, complicates
interpretations of effects. For example, the zero-inflated Poisson model
accounts for the presence of excess zeros but the parameter estimates do not
have a direct marginal inferential ability as its base model, the Poisson
model. Marginalizations due to the presence of excess zeros are underdeveloped
though demand for such is interestingly high. The aim of this paper is to
develop a marginalized model for zero-inflated univariate count outcome in the
presence of overdispersion. Emphasis is placed on methodological development,
efficient estimation of model parameters, implementation and application to two
empirical studies. A simulation study is performed to assess the performance of
the model. Results from the analysis of two case studies indicated that the
refined procedure performs significantly better than models which do not
simultaneously correct for overdispersion and presence of excess zero counts in
terms of likelihood comparisons and AIC values. The simulation studies also
supported these findings. In addition, the proposed technique yielded small
biases and mean square errors for model parameters. To ensure that the proposed
method enjoys widespread use, it is implemented using the SAS NLMIXED procedure
with minimal coding efforts.Comment: 28 page
Crossing the hurdle: the determinants of individual scientific performance
An original cross sectional dataset referring to a medium sized Italian
university is implemented in order to analyze the determinants of scientific
research production at individual level. The dataset includes 942 permanent
researchers of various scientific sectors for a three year time span (2008 -
2010). Three different indicators - based on the number of publications or
citations - are considered as response variables. The corresponding
distributions are highly skewed and display an excess of zero - valued
observations. In this setting, the goodness of fit of several Poisson mixture
regression models are explored by assuming an extensive set of explanatory
variables. As to the personal observable characteristics of the researchers,
the results emphasize the age effect and the gender productivity gap, as
previously documented by existing studies. Analogously, the analysis confirm
that productivity is strongly affected by the publication and citation
practices adopted in different scientific disciplines. The empirical evidence
on the connection between teaching and research activities suggests that no
univocal substitution or complementarity thesis can be claimed: a major
teaching load does not affect the odds to be a non-active researcher and does
not significantly reduce the number of publications for active researchers. In
addition, new evidence emerges on the effect of researchers administrative
tasks, which seem to be negatively related with researcher's productivity, and
on the composition of departments. Researchers' productivity is apparently
enhanced by operating in department filled with more administrative and
technical staff, and it is not significantly affected by the composition of the
department in terms of senior or junior researchers.Comment: Revised version accepted for publication by Scientometric
Estimating healthcare demand for an aging population: a flexible and robust bayesian joint model
In this paper, we analyse two frequently used measures of the demand for health care, namely hospital visits and out-of-pocket health care expenditure, which have been analysed separately in the existing literature. Given that these two measures of healthcare demand are highly likely to be closely correlated, we propose a framework to jointly model hospital visits and out-of-pocket medical expenditure. Furthermore, the joint framework allows for the presence of non-linear effects of covariates using splines to capture the effects of aging on healthcare demand. Sample heterogeneity is modelled robustly with the random effects following Dirichlet process priors with explicit cross-part correlation. The findings of our empirical analysis of the U.S. Health and Retirement Survey indicate that the demand for healthcare varies with age and gender and exhibits significant cross-part correlation that provides a rich understanding of how aging affects health care demand, which is of particular policy relevance in the context of an aging population
The Relationship of Education and Acculturation with Vigorous Intensity Leisure Time Physical Activity by Gender in Latinos
Objectives: Latinos have poorer health outcomes among certain conditions (e.g. diabetes, obesity, mental health) compared to non-Latino Whites in the U.S., in part due to difference in the amount of physical activity, which are heavily influenced by sociocultural factors such as educational attainment and acculturation. Vigorous-intensity leisure time physical activity (VLTPA) may provide health benefits with a shorter amount of time than moderate-to-light physical activity. However, VLTPA has been significantly understudied compared to LTPA in general. The purpose of this study is to examine the associations between educational attainment, acculturation, and VLTPA by gender among Latino adults in the U.S. Design: Nationally representative samples of Latino adults aged 25 years and older (n = 4393) from the 2010 National Health Interview Survey were analyzed. VLTPA was measured as the number of hours per week of VLTPA consisting of heavy sweating or large increases in breathing and heart rate. Acculturation was measured as the degree to which the English language versus the Spanish language was spoken most often. The zero-inflated Poisson regression model was constructed using the full information maximum likelihood estimation and controlling for a series of sociodemographic characteristics and relevant health behaviors. Results: Educational attainment was positively associated with VLTPA among Latino adults [exp(b) = 1.09, p \u3c 0.05)]. Similarly, greater acculturation was associated with greater hours/week of VLTPA [exp(b) = 1.10, p \u3c 0.05)]. Lastly, the effect of educational attainment on VLTPA significantly varied by gender. Conclusions: Education had a positive association and acculturation had negative association with the hours/week of VLTPA among Latinos. Also, the association between education and VLTPA was significantly stronger among women than men. These findings inform culturally and socially sensitive approaches to improve the health of Latinos, in hopes to address health disparities by race/ethnicity the U.S
- …