Search CORE

2,568 research outputs found

Multilevel selection models using gllamm

Author: Sophia Rabe-Hesketh
Publication venue
Publication date
Field of study

Models for handling sample selection or informative missingness have been developed for both cross sectional and longitudinal or panel data. For cross sectional data, Heckman (1979) suggested a joint model for the response and sample selection processes where the disturbances of the processes are correlated. For longitudinal data, Hausman and Wise (1979) and Diggle and Kenward (1994) developed a model in which the continuous response (observed or unobserved), and possibly the lagged response, is a predictor of attrition or dropout. The Heckman model can be estimated using the heckman command in Stata and the Diggle-Kenward model is available in the Oswald package running in S-PLUS. Both models can also be estimated using gllamm with the advantage that the following three generalisations are possible. First, the models can be extended to multilevel settings where there may be unobserved heterogeneity between the clusters at the different levels in both the substantive and selection processes and where selection may operate at several levels. Second, the Heckman model can be modified for non-normal response processes. Third, both the Heckman and Diggle-Kenward models can be extended to situations where the substantive response is a latent variable measured by a number of indicators. I will show how the standard Heckman and Diggle-Kenward models are estimated in gllamm and give a examples of all three types of generalisation of these standard models. The research was carried out jointly with Anders Skrondal and Andrew Pickles.

Research Papers in Economics

Missing ordinal covariates with informative selection

Author: Alfonso Miranda
Sophia Rabe-Hesketh
Publication venue
Publication date
Field of study

This paper considers the problem of parameter estimation in a model for a continuous response variable y when an important ordinal explanatory variable x is missing for a large proportion of the sample. Non-missingness of x, or sample selection, is correlated with the response variable and/or with the unobserved values the ordinal explanatory variable takes when missing. We suggest solving the endogenous selection, or 'not missing at random' (NMAR), problem by modelling the informative selection mechanism, the ordinal explanatory variable, and the response variable together. The use of the method is illustrated by re-examining the problem of the ethnic gap in school achievement at age 16 in England using linked data from the National Pupil database (NPD), the Longitudinal Study of Young People in England (LSYPE), and the Census 2001.Missing covariate, sample selection, latent class models, ordinal variables, NMAR

Research Papers in Economics

Bayesian comparison of latent variable models: Conditional vs marginal likelihoods

Author: Furr D.
Merkle E. C.
Rabe-Hesketh S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/07/2019
Field of study

Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where "clusters" typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out (LOcO) cross-validation, whereas conditional WAIC corresponds to leave-one-unit out (LOuO). These results lead to recommendations on the general application of the criteria to models with latent variables.Comment: Manuscript in press at Psychometrika; 31 pages, 8 figure

arXiv.org e-Print Archive

Measuring school value added with administrative data: the problem of missing variables

Author: Alfonso Miranda
Lorraine Dearden
Sophia Rabe-Hesketh
Publication venue
Publication date
Field of study

The UK Department for Education (DfE) calculates contextualised value added (CVA) measures of school performance using administrative data that contain only a limited set of explanatory variables. Differences on schools’ intake regarding characteristics such as mother’s education are not accounted for due to the lack of background information in the data. In this paper we use linked survey and administrative data to assess the potential biases that missing control variables cause in the calculation of CVA measures of school performance. We find that ignoring the effect of mother’s education leads DfE to erroneously over-penalise low achieving schools that have a greater proportion of mothers with low qualifications and to over-reward high achieving schools that have a greater proportion of mothers with higher qualifications. This suggests that collecting a rich set of controls in administrative records is necessary for producing reliable CVA measures of school performance.contextualised value added, missing data, informative sample selection, administrative data, UK

Research Papers in Economics

Generalized latent class modeling using gllamm

Author: Anders Skrondal
Andrew Pickles
Sophia Rabe-Hesketh
Publication venue
Publication date
Field of study

gllamm can estimate both conventional and unconventional latent class models. Models are specified using discrete latent variables whose values determine the conditional response distributions for the classes. A new feature of gllamm is that latent class probabilities can depend on covariates. We will first discuss the conventional exploratory latent class model. When a number of fallible diagnoses of some disease are available, this model can be used to estimate the prevalence of the disease as well as the sensitivities and specificities of the tests in the absence of a gold standard. After estimating the model in gllamm, gllapred can be used to diagnose individual subjects based on their posterior class probabilities. An advantage of using gllamm is that a wide range of response types can be accommodated. To illustrate this, we consider the analysis of rankings of political goals in the study of value orientations. We will also discuss confirmatory models such as latent class factor models and apply them to attitudes to abortion data, taking the survey design into account by using probability weighting and robust standard errors. Finally, we consider latent trajectory models for investigating distinct patterns of change in longitudinal data.

Research Papers in Economics

Multilevel modeling of complex survey data

Author: Sophia Rabe-Hesketh
Publication venue
Publication date
Field of study

Survey data are often analyzed using multilevel or hierarchical models. For example, in education surveys, schools may be sampled at the first stage and students at the second stage and multilevel models used to model within-school and between-school variability. An important aspect of most surveys that is often ignored in multilevel modeling is that units at each stage are sampled with unequal probabilities. Standard maximum likelihood estimation can be modified to take the sampling probabilities into account, yielding pseudomaximum likelihood estimation, which is typically combined with robust standard errors based on the sandwich estimator. This approach is implemented in gllamm. I will introduce the ideas, discuss issues that arise such as the scaling of the weights, and illustrate the approach by applying it to data from the Program for International Student Assessment (PISA).

Research Papers in Economics

Diagnostics for generalised linear mixed models

Author: Anders Skrondal
Sophia Rabe-Hesketh
Publication venue
Publication date
Field of study

Generalized linear mixed models are generalized linear models that include random effects varying between clusters or 'higher-level' units of hierarchically structured data. Such models can be estimated using gllamm. The prediction command gllapred can be used to obtain empirical Bayes predictions of the random effects, interpretable as higher-level residuals. Combined with approximate sampling standard deviations, these residuals can be used for identifying unusual higher-level units. However, since the distribution of these predictions is generally not known, we recommend simulating responses from the model using gllasim and comparing 'observed' and simulated residuals. We also discuss different types of level 1 residuals and influence diagnostics.

Research Papers in Economics

Wishes and Grumbles

Author: Sophia Rabe-Hesketh
Stephen P. Jenkins
Publication venue
Publication date
Field of study

This report summarises the 'wishes and grumbles' session from the Ninth UK Stata Users Group meeting.

Research Papers in Economics

Missing Covariates with Informative Selection

Author: Miranda Alfonso
Rabe-Hesketh Sophia
Publication venue
Publication date: 01/07/2010
Field of study

National Centre for Research Methods: NCRM EPrints Repository

Missing ordinal covariates with informative selection

Author: Miranda Alfonso
Rabe-Hesketh Sophia
Publication venue: Department of Quantitative Social Science, Institute of Education
Publication date: 25/01/2010
Field of study

National Centre for Research Methods: NCRM EPrints Repository