186 research outputs found
General Design Bayesian Generalized Linear Mixed Models
Linear mixed models are able to handle an extraordinary range of
complications in regression-type analyses. Their most common use is to account
for within-subject correlation in longitudinal data analysis. They are also the
standard vehicle for smoothing spatial count data. However, when treated in
full generality, mixed models can also handle spline-type smoothing and closely
approximate kriging. This allows for nonparametric regression models (e.g.,
additive models and varying coefficient models) to be handled within the mixed
model framework. The key is to allow the random effects design matrix to have
general structure; hence our label general design. For continuous response
data, particularly when Gaussianity of the response is reasonably assumed,
computation is now quite mature and supported by the R, SAS and S-PLUS
packages. Such is not the case for binary and count responses, where
generalized linear mixed models (GLMMs) are required, but are hindered by the
presence of intractable multivariate integrals. Software known to us supports
special cases of the GLMM (e.g., PROC NLMIXED in SAS or glmmML in R) or relies
on the sometimes crude Laplace-type approximation of integrals (e.g., the SAS
macro glimmix or glmmPQL in R). This paper describes the fitting of general
design generalized linear mixed models. A Bayesian approach is taken and Markov
chain Monte Carlo (MCMC) is used for estimation and inference. In this
generalized setting, MCMC requires sampling from nonstandard distributions. In
this article, we demonstrate that the MCMC package WinBUGS facilitates sound
fitting of general design Bayesian generalized linear mixed models in practice.Comment: Published at http://dx.doi.org/10.1214/088342306000000015 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Computational Techniques for Spatial Logistic Regression with Large Datasets
In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.
A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches.
The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models
Bayesian Generalized Linear Mixed Effects Models Using Normal-Independent Distributions: Formulation and Applications
A standard assumption is that the random effects of Generalized Linear Mixed Effects Models (GLMMs) follow the normal distribution. However, this assumption has been found to be quite unrealistic and sometimes too restrictive as revealed in many real-life situations. A common case of departures from normality includes the presence of outliers leading to heavy-tailed distributed random effects. This work, therefore, aims to develop a robust GLMM framework by replacing the normality assumption on the random effects by the distributions belonging to the Normal-Independent (NI) class. The resulting models are called the Normal-Independent GLMM (NI-GLMM). The four special cases of the NI class considered in these models’ formulations include the normal, Student-t, Slash and contaminated normal distributions. A full Bayesian technique was adopted for estimation and inference. A real-life data set on cotton bolls was used to demonstrate the performance of the proposed NI-GLMM methodology
Likelihood Inference for Models with Unobservables: Another View
There have been controversies among statisticians on (i) what to model and
(ii) how to make inferences from models with unobservables. One such
controversy concerns the difference between estimation methods for the marginal
means not necessarily having a probabilistic basis and statistical models
having unobservables with a probabilistic basis. Another concerns
likelihood-based inference for statistical models with unobservables. This
needs an extended-likelihood framework, and we show how one such extension,
hierarchical likelihood, allows this to be done. Modeling of unobservables
leads to rich classes of new probabilistic models from which likelihood-type
inferences can be made naturally with hierarchical likelihood.Comment: This paper discussed in: [arXiv:1010.0804], [arXiv:1010.0807],
[arXiv:1010.0810]. Rejoinder at [arXiv:1010.0814]. Published in at
http://dx.doi.org/10.1214/09-STS277 the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Numerically Stable Approximate Bayesian Methods for Generalized Linear Mixed Models and Linear Model Selection
Approximate Bayesian inference methods offer methodology for fitting Bayesian models as fast alternatives to Markov Chain Monte Carlo methods that sometimes have only a slight loss of accuracy. In this thesis, we consider variable selection for linear models, and zero inflated mixed models. Variable selection for linear regression models are ubiquitous in applied statistics. We use the popular g-prior (Zellner, 1986) for model selection of linear models with normal priors where g is a prior hyperparameter. We derive exact expressions for the model selection Bayes Factors in terms of special functions depending on the sample size, number of covariates and R-squared of the model. We show that these expressions are accurate, fast to evaluate, and numerically stable. An R package blma for doing Bayesian linear model averaging using these exact expressions has been released on GitHub. We extend the Particle EM method of (Rockova, 2017) using Particle Variational Approximation and the exact posterior marginal likelihood expressions to derive a computationally efficient algorithm for model selection on data sets with many covariates. Our algorithm performs well relative to existing algorithms, completing in 8 seconds on a model selection problem with a sample size of 600 and 7200 covariates. We consider zero-inflated models that have many applications in areas such as manufacturing and public health, but pose numerical issues when fitting them to data. We apply a variational approximation to zero-inflated Poisson mixed models with Gaussian distributed random effects using a combination of VB and the Gaussian Variational Approximation (GVA). We also incorporate a novel parameterisation of the covariance of the GVA using the Cholesky factor of the precision matrix, similar to Tan and Nott (2018) to resolve associated numerical difficulties
Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators
The Italian National Institute for Statistics regularly provides estimates of
unemployment indicators using data from the Labor Force Survey. However, direct
estimates of unemployment incidence cannot be released for Local Labor Market
Areas. These are unplanned domains defined as clusters of municipalities; many
are out-of-sample areas and the majority is characterized by a small sample
size, which render direct estimates inadequate. The Empirical Best Predictor
represents an appropriate, model-based, alternative. However, for non-Gaussian
responses, its computation and the computation of the analytic approximation to
its Mean Squared Error require the solution of (possibly) multiple integrals
that, generally, have not a closed form. To solve the issue, Monte Carlo
methods and parametric bootstrap are common choices, even though the
computational burden is a non trivial task. In this paper, we propose a
Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed
effect model by leaving the distribution of the area-specific random effects
unspecified and estimating it from the observed data. This approach is known to
lead to a discrete mixing distribution which helps avoid unverifiable
parametric assumptions and heavy integral approximations. We also derive a
second-order, bias-corrected, analytic approximation to the corresponding Mean
Squared Error. Finite sample properties of the proposed approach are tested via
a large scale simulation study. Furthermore, the proposal is applied to
unit-level data from the 2012 Italian Labor Force Survey to estimate
unemployment incidence for 611 Local Labor Market Areas using auxiliary
information from administrative registers and the 2011 Census
Generalised Linear Mixed Model Specification, Analysis, Fitting, and Optimal Design in R with the glmmr Packages
We describe the \proglang{R} package \pkg{glmmrBase} and an extension
\pkg{glmmrOptim}. \pkg{glmmrBase} provides a flexible approach to specifying,
fitting, and analysing generalised linear mixed models. We use an
object-orientated class system within \proglang{R} to provide methods for a
wide range of covariance and mean functions, including specification of
non-linear functions of data and parameters, relevant to multiple applications
including cluster randomised trials, cohort studies, spatial and
spatio-temporal modelling, and split-plot designs. The class generates relevant
matrices and statistics and a wide range of methods including full likelihood
estimation of generalised linear mixed models using Markov Chain Monte Carlo
Maximum Likelihood, Laplace approximation, power calculation, and access to
relevant calculations. The class also includes Hamiltonian Monte Carlo
simulation of random effects, sparse matrix methods, and other functionality to
support efficient estimation. The \pkg{glmmrOptim} package implements a set of
algorithms to identify c-optimal experimental designs where observations are
correlated and can be specified using the generalised linear mixed model
classes. Several examples and comparisons to existing packages are provided to
illustrate use of the packages
AN INTRODUCTION TO GENERALIZED LINEAR MIXED MODELS
The generalized linear mixed model (GLMM) generalizes the standard linear model in three ways: accommodation of non-normally distributed responses, specification of a possibly non-linear link between the mean of the response and the predictors, and allowance for some forms of correlation in the data. As such, GLMMs have broad utility and are of great practical importance. Two special cases of the GLMM are the linear mixed model (LMM) and the generalized linear model (GLM). Despite the utility of such models, their use has been limited due to the lack of reliable, well-tested estimation and testing methods. I first describe and give examples of GLMMs and then discuss methods of estimation including maximum likelihood, generalized estimating equations, and penalized quasi-likelihood. Finally I briefly survey current research efforts in GLMMs
Small area prediction of proportions and counts under a spatial Poisson mixed model
[Abstract]: This paper introduces an area-level Poisson mixed model with SAR(1) spatially correlated random effects. Small area predictors of proportions and counts are derived from the new model and the corresponding mean squared errors are estimated by parametric bootstrap. The behaviour of the introduced predictors is empirically investigated by running model-based simulation experiments. An application to real data from the Spanish living conditions survey of Galicia (Spain) is given. The target is the estimation of domain proportions of women under the poverty line.Supported by the Instituto Galego de EstatÃstica, by MICINN Grants PID2020-113578RB-I00 and PGC2018-096840-B-I00, by the Generalitat Valenciana Grant PROMETEO/2021/063 and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C 2020/14), and by GAIN (Galician Innovation Agency) and the Regional Ministry of Economy, Employment and Industry Grant COV20/00604 and Centro de Investigación del Sistema Universitario de Galicia ED431G 2019/01, all of them through the ERDF.Generalitat Valenciana; PROMETEO/2021/063Xunta de Galicia; ED431C/2020/14Xunta de Galicia; COV20/00604Xunta de Galicia; ED431G/2019/0
- …