186 research outputs found

    General Design Bayesian Generalized Linear Mixed Models

    Get PDF
    Linear mixed models are able to handle an extraordinary range of complications in regression-type analyses. Their most common use is to account for within-subject correlation in longitudinal data analysis. They are also the standard vehicle for smoothing spatial count data. However, when treated in full generality, mixed models can also handle spline-type smoothing and closely approximate kriging. This allows for nonparametric regression models (e.g., additive models and varying coefficient models) to be handled within the mixed model framework. The key is to allow the random effects design matrix to have general structure; hence our label general design. For continuous response data, particularly when Gaussianity of the response is reasonably assumed, computation is now quite mature and supported by the R, SAS and S-PLUS packages. Such is not the case for binary and count responses, where generalized linear mixed models (GLMMs) are required, but are hindered by the presence of intractable multivariate integrals. Software known to us supports special cases of the GLMM (e.g., PROC NLMIXED in SAS or glmmML in R) or relies on the sometimes crude Laplace-type approximation of integrals (e.g., the SAS macro glimmix or glmmPQL in R). This paper describes the fitting of general design generalized linear mixed models. A Bayesian approach is taken and Markov chain Monte Carlo (MCMC) is used for estimation and inference. In this generalized setting, MCMC requires sampling from nonstandard distributions. In this article, we demonstrate that the MCMC package WinBUGS facilitates sound fitting of general design Bayesian generalized linear mixed models in practice.Comment: Published at http://dx.doi.org/10.1214/088342306000000015 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Computational Techniques for Spatial Logistic Regression with Large Datasets

    Get PDF
    In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models

    Bayesian Generalized Linear Mixed Effects Models Using Normal-Independent Distributions: Formulation and Applications

    Get PDF
    A standard assumption is that the random effects of Generalized Linear Mixed Effects Models (GLMMs) follow the normal distribution. However, this assumption has been found to be quite unrealistic and sometimes too restrictive as revealed in many real-life situations. A common case of departures from normality includes the presence of outliers leading to heavy-tailed distributed random effects. This work, therefore, aims to develop a robust GLMM framework by replacing the normality assumption on the random effects by the distributions belonging to the Normal-Independent (NI) class. The resulting models are called the Normal-Independent GLMM (NI-GLMM). The four special cases of the NI class considered in these models’ formulations include the normal, Student-t, Slash and contaminated normal distributions. A full Bayesian technique was adopted for estimation and inference. A real-life data set on cotton bolls was used to demonstrate the performance of the proposed NI-GLMM methodology

    Likelihood Inference for Models with Unobservables: Another View

    Full text link
    There have been controversies among statisticians on (i) what to model and (ii) how to make inferences from models with unobservables. One such controversy concerns the difference between estimation methods for the marginal means not necessarily having a probabilistic basis and statistical models having unobservables with a probabilistic basis. Another concerns likelihood-based inference for statistical models with unobservables. This needs an extended-likelihood framework, and we show how one such extension, hierarchical likelihood, allows this to be done. Modeling of unobservables leads to rich classes of new probabilistic models from which likelihood-type inferences can be made naturally with hierarchical likelihood.Comment: This paper discussed in: [arXiv:1010.0804], [arXiv:1010.0807], [arXiv:1010.0810]. Rejoinder at [arXiv:1010.0814]. Published in at http://dx.doi.org/10.1214/09-STS277 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Numerically Stable Approximate Bayesian Methods for Generalized Linear Mixed Models and Linear Model Selection

    Get PDF
    Approximate Bayesian inference methods offer methodology for fitting Bayesian models as fast alternatives to Markov Chain Monte Carlo methods that sometimes have only a slight loss of accuracy. In this thesis, we consider variable selection for linear models, and zero inflated mixed models. Variable selection for linear regression models are ubiquitous in applied statistics. We use the popular g-prior (Zellner, 1986) for model selection of linear models with normal priors where g is a prior hyperparameter. We derive exact expressions for the model selection Bayes Factors in terms of special functions depending on the sample size, number of covariates and R-squared of the model. We show that these expressions are accurate, fast to evaluate, and numerically stable. An R package blma for doing Bayesian linear model averaging using these exact expressions has been released on GitHub. We extend the Particle EM method of (Rockova, 2017) using Particle Variational Approximation and the exact posterior marginal likelihood expressions to derive a computationally efficient algorithm for model selection on data sets with many covariates. Our algorithm performs well relative to existing algorithms, completing in 8 seconds on a model selection problem with a sample size of 600 and 7200 covariates. We consider zero-inflated models that have many applications in areas such as manufacturing and public health, but pose numerical issues when fitting them to data. We apply a variational approximation to zero-inflated Poisson mixed models with Gaussian distributed random effects using a combination of VB and the Gaussian Variational Approximation (GVA). We also incorporate a novel parameterisation of the covariance of the GVA using the Cholesky factor of the precision matrix, similar to Tan and Nott (2018) to resolve associated numerical difficulties

    Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators

    Full text link
    The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the Labor Force Survey. However, direct estimates of unemployment incidence cannot be released for Local Labor Market Areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas and the majority is characterized by a small sample size, which render direct estimates inadequate. The Empirical Best Predictor represents an appropriate, model-based, alternative. However, for non-Gaussian responses, its computation and the computation of the analytic approximation to its Mean Squared Error require the solution of (possibly) multiple integrals that, generally, have not a closed form. To solve the issue, Monte Carlo methods and parametric bootstrap are common choices, even though the computational burden is a non trivial task. In this paper, we propose a Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed effect model by leaving the distribution of the area-specific random effects unspecified and estimating it from the observed data. This approach is known to lead to a discrete mixing distribution which helps avoid unverifiable parametric assumptions and heavy integral approximations. We also derive a second-order, bias-corrected, analytic approximation to the corresponding Mean Squared Error. Finite sample properties of the proposed approach are tested via a large scale simulation study. Furthermore, the proposal is applied to unit-level data from the 2012 Italian Labor Force Survey to estimate unemployment incidence for 611 Local Labor Market Areas using auxiliary information from administrative registers and the 2011 Census

    Generalised Linear Mixed Model Specification, Analysis, Fitting, and Optimal Design in R with the glmmr Packages

    Full text link
    We describe the \proglang{R} package \pkg{glmmrBase} and an extension \pkg{glmmrOptim}. \pkg{glmmrBase} provides a flexible approach to specifying, fitting, and analysing generalised linear mixed models. We use an object-orientated class system within \proglang{R} to provide methods for a wide range of covariance and mean functions, including specification of non-linear functions of data and parameters, relevant to multiple applications including cluster randomised trials, cohort studies, spatial and spatio-temporal modelling, and split-plot designs. The class generates relevant matrices and statistics and a wide range of methods including full likelihood estimation of generalised linear mixed models using Markov Chain Monte Carlo Maximum Likelihood, Laplace approximation, power calculation, and access to relevant calculations. The class also includes Hamiltonian Monte Carlo simulation of random effects, sparse matrix methods, and other functionality to support efficient estimation. The \pkg{glmmrOptim} package implements a set of algorithms to identify c-optimal experimental designs where observations are correlated and can be specified using the generalised linear mixed model classes. Several examples and comparisons to existing packages are provided to illustrate use of the packages

    AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS

    Get PDF
    The generalized linear mixed model (GLMM) generalizes the standard linear model in three ways: accommodation of non-normally distributed responses, specification of a possibly non-linear link between the mean of the response and the predictors, and allowance for some forms of correlation in the data. As such, GLMMs have broad utility and are of great practical importance. Two special cases of the GLMM are the linear mixed model (LMM) and the generalized linear model (GLM). Despite the utility of such models, their use has been limited due to the lack of reliable, well-tested estimation and testing methods. I first describe and give examples of GLMMs and then discuss methods of estimation including maximum likelihood, generalized estimating equations, and penalized quasi-likelihood. Finally I briefly survey current research efforts in GLMMs

    Small area prediction of proportions and counts under a spatial Poisson mixed model

    Get PDF
    [Abstract]: This paper introduces an area-level Poisson mixed model with SAR(1) spatially correlated random effects. Small area predictors of proportions and counts are derived from the new model and the corresponding mean squared errors are estimated by parametric bootstrap. The behaviour of the introduced predictors is empirically investigated by running model-based simulation experiments. An application to real data from the Spanish living conditions survey of Galicia (Spain) is given. The target is the estimation of domain proportions of women under the poverty line.Supported by the Instituto Galego de Estatística, by MICINN Grants PID2020-113578RB-I00 and PGC2018-096840-B-I00, by the Generalitat Valenciana Grant PROMETEO/2021/063 and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C 2020/14), and by GAIN (Galician Innovation Agency) and the Regional Ministry of Economy, Employment and Industry Grant COV20/00604 and Centro de Investigación del Sistema Universitario de Galicia ED431G 2019/01, all of them through the ERDF.Generalitat Valenciana; PROMETEO/2021/063Xunta de Galicia; ED431C/2020/14Xunta de Galicia; COV20/00604Xunta de Galicia; ED431G/2019/0
    • …
    corecore