17,654 research outputs found

    The Lazy Bootstrap. A Fast Resampling Method for Evaluating Latent Class Model Fit

    Get PDF
    The latent class model is a powerful unsupervised clustering algorithm for categorical data. Many statistics exist to test the fit of the latent class model. However, traditional methods to evaluate those fit statistics are not always useful. Asymptotic distributions are not always known, and empirical reference distributions can be very time consuming to obtain. In this paper we propose a fast resampling scheme with which any type of model fit can be assessed. We illustrate it here on the latent class model, but the methodology can be applied in any situation. The principle behind the lazy bootstrap method is to specify a statistic which captures the characteristics of the data that a model should capture correctly. If those characteristics in the observed data and in model-generated data are very different we can assume that the model could not have produced the observed data. With this method we achieve the flexibility of tests from the Bayesian framework, while only needing maximum likelihood estimates. We provide a step-wise algorithm with which the fit of a model can be assessed based on the characteristics we as researcher find important. In a Monte Carlo study we show that the method has very low type I errors, for all illustrated statistics. Power to reject a model depended largely on the type of statistic that was used and on sample size. We applied the method to an empirical data set on clinical subgroups with risk of Myocardial infarction and compared the results directly to the parametric bootstrap. The results of our method were highly similar to those obtained by the parametric bootstrap, while the required computations differed three orders of magnitude in favour of our method.Comment: This is an adaptation of chapter of a PhD dissertation available at https://pure.uvt.nl/portal/files/19030880/Kollenburg_Computer_13_11_2017.pd

    Structured additive regression for multicategorical space-time data: A mixed model approach

    Get PDF
    In many practical situations, simple regression models suffer from the fact that the dependence of responses on covariates can not be sufficiently described by a purely parametric predictor. For example effects of continuous covariates may be nonlinear or complex interactions between covariates may be present. A specific problem of space-time data is that observations are in general spatially and/or temporally correlated. Moreover, unobserved heterogeneity between individuals or units may be present. While, in recent years, there has been a lot of work in this area dealing with univariate response models, only limited attention has been given to models for multicategorical space-time data. We propose a general class of structured additive regression models (STAR) for multicategorical responses, allowing for a flexible semiparametric predictor. This class includes models for multinomial responses with unordered categories as well as models for ordinal responses. Non-linear effects of continuous covariates, time trends and interactions between continuous covariates are modelled through Bayesian versions of penalized splines and flexible seasonal components. Spatial effects can be estimated based on Markov random fields, stationary Gaussian random fields or two-dimensional penalized splines. We present our approach from a Bayesian perspective, allowing to treat all functions and effects within a unified general framework by assigning appropriate priors with different forms and degrees of smoothness. Inference is performed on the basis of a multicategorical linear mixed model representation. This can be viewed as posterior mode estimation and is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are then estimated by using restricted maximum likelihood. Numerically efficient algorithms allow computations even for fairly large data sets. As a typical example we present results on an analysis of data from a forest health survey

    Penalized likelihood estimation and iterative kalman smoothing for non-gaussian dynamic regression models

    Get PDF
    Dynamic regression or state space models provide a flexible framework for analyzing non-Gaussian time series and longitudinal data, covering for example models for discrete longitudinal observations. As for non-Gaussian random coefficient models, a direct Bayesian approach leads to numerical integration problems, often intractable for more complicated data sets. Recent Markov chain Monte Carlo methods avoid this by repeated sampling from approximative posterior distributions, but there are still open questions about sampling schemes and convergence. In this article we consider simpler methods of inference based on posterior modes or, equivalently, maximum penalized likelihood estimation. From the latter point of view, the approach can also be interpreted as a nonparametric method for smoothing time-varying coefficients. Efficient smoothing algorithms are obtained by iteration of common linear Kalman filtering and smoothing, in the same way as estimation in generalized linear models with fixed effects can be performed by iteratively weighted least squares estimation. The algorithm can be combined with an EM-type method or cross-validation to estimate unknown hyper- or smoothing parameters. The approach is illustrated by applications to a binary time series and a multicategorical longitudinal data set

    Joint Regression and Association Models for Repeated Categorical Responses

    Get PDF
    The focus of this study is on statistical analysis of categorical responses, where the response values are dependent of each other. The most typical example of this kind of dependence is when repeated responses have been obtained from the same study unit. For example, in Paper I, the response of interest is the pneumococcal nasopharengyal carriage (yes/no) on 329 children. For each child, the carriage is measured nine times during the first 18 months of life, and thus repeated respones on each child cannot be assumed independent of each other. In the case of the above example, the interest typically lies in the carriage prevalence, and whether different risk factors affect the prevalence. Regression analysis is the established method for studying the effects of risk factors. In order to make correct inferences from the regression model, the associations between repeated responses need to be taken into account. The analysis of repeated categorical responses typically focus on regression modelling. However, further insights can also be gained by investigating the structure of the association. The central theme in this study is on the development of joint regression and association models. The analysis of repeated, or otherwise clustered, categorical responses is computationally difficult. Likelihood-based inference is often feasible only when the number of repeated responses for each study unit is small. In Paper IV, an algorithm is presented, which substantially facilitates maximum likelihood fitting, especially when the number of repeated responses increase. In addition, a notable result arising from this work is the freely available software for likelihood-based estimation of clustered categorical responses.Tutkimus kÀsittelee kategorisen vasteen tilastollista analyysiÀ tilanteessa, jossa vastearvojen vÀlillÀ on riippuvuutta. TyypillisimmillÀÀn tÀllaista riippuvuutta esiintyy silloin, kun samalta tutkimuskohteelta on havaittu vaste useana ajankohtana. Esimerkiksi tÀmÀn työn ensimmÀisessÀ artikkelissa tutkimuskohteena on 329 lasta, ja tutkittavana vasteena on pneumokokkibakteerin nielukantajuus (kyllÀ/ei). Kantajuus on mitattu kultakin lapselta yhdeksÀn kertaa ensimmÀisen 18 ikÀkuukauden aikana, jolloin saman lapsen toistuvien mittausten ei voida olettaa olevan riippumattomia toisistaan. Esimerkin kaltaisessa tilanteessa ollaan tyypillisesti kiinnostuneita kantajuuden yleisyydestÀ, sekÀ siitÀ, onko tietyillÀ riskitekijöillÀ vaikutusta yleisyyteen. Riskitekijöiden vaikutusta tarkastellaan regressiomallilla. Jotta regressiomallista tehtÀvÀt pÀÀtelmÀt eivÀt olisi virheellisiÀ, on analyysissÀ otettava huomioon toistettujen mittausten vÀlinen riipuvuus. Analyysin pÀÀpaino on tavallisesti virheettömÀssÀ regressiomallinnuksessa. Kuitenkin vastearvojen vÀlisen riippuvuuden tutkimuksella voidaan saavuttaa arvokasta lisÀinformaatiota. TÀmÀn työn keskeisenÀ teemana on regression ja vastearvojen riippuvuuden samanaikainen tilastollinen mallinnus. Toistetun, tai muuten ryhmitellyn, kategorisen vasteen analyysi on laskennallisesti haastavaa. Uskottavuusperusteinen pÀÀttely on tyypillisesti mahdollista vain, jos toistettuja mittauksia on kultakin tutkimuskohteelta vain muutama. TÀmÀn työn neljÀnnessÀ artikkelissa esitellÀÀn laskenta-algoritmi, joka helpottaa huomattavasti suurimman uskottavuuden estimointia, eritoten kun toistojen lukumÀÀrÀ kasvaa. Olennainen osa tutkimuksen tuloksia on myös vapaasti saatavilla oleva ohjelmisto ryhmitellyn kategorisen vastemuuttujan uskottavuusanalyysiin

    A review of R-packages for random-intercept probit regression in small clusters

    Get PDF
    Generalized Linear Mixed Models (GLMMs) are widely used to model clustered categorical outcomes. To tackle the intractable integration over the random effects distributions, several approximation approaches have been developed for likelihood-based inference. As these seldom yield satisfactory results when analyzing binary outcomes from small clusters, estimation within the Structural Equation Modeling (SEM) framework is proposed as an alternative. We compare the performance of R-packages for random-intercept probit regression relying on: the Laplace approximation, adaptive Gaussian quadrature (AGQ), Penalized Quasi-Likelihood (PQL), an MCMC-implementation, and integrated nested Laplace approximation within the GLMM-framework, and a robust diagonally weighted least squares estimation within the SEM-framework. In terms of bias for the fixed and random effect estimators, SEM usually performs best for cluster size two, while AGQ prevails in terms of precision (mainly because of SEM's robust standard errors). As the cluster size increases, however, AGQ becomes the best choice for both bias and precision

    Binary Models for Marginal Independence

    Full text link
    Log-linear models are a classical tool for the analysis of contingency tables. In particular, the subclass of graphical log-linear models provides a general framework for modelling conditional independences. However, with the exception of special structures, marginal independence hypotheses cannot be accommodated by these traditional models. Focusing on binary variables, we present a model class that provides a framework for modelling marginal independences in contingency tables. The approach taken is graphical and draws on analogies to multivariate Gaussian models for marginal independence. For the graphical model representation we use bi-directed graphs, which are in the tradition of path diagrams. We show how the models can be parameterized in a simple fashion, and how maximum likelihood estimation can be performed using a version of the Iterated Conditional Fitting algorithm. Finally we consider combining these models with symmetry restrictions

    A Bayesian semiparametric latent variable model for mixed responses

    Get PDF
    In this article we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric predictor. We extend existing LVM with simple linear covariate effects by including nonparametric components for nonlinear effects of continuous covariates and interactions with other covariates as well as spatial effects. Full Bayesian modelling is based on penalized spline and Markov random field priors and is performed by computationally efficient Markov chain Monte Carlo (MCMC) methods. We apply our approach to a large German social science survey which motivated our methodological development
    • 

    corecore