894 research outputs found
Formal and Informal Model Selection with Incomplete Data
Model selection and assessment with incomplete data pose challenges in
addition to the ones encountered with complete data. There are two main reasons
for this. First, many models describe characteristics of the complete data, in
spite of the fact that only an incomplete subset is observed. Direct comparison
between model and data is then less than straightforward. Second, many commonly
used models are more sensitive to assumptions than in the complete-data
situation and some of their properties vanish when they are fitted to
incomplete, unbalanced data. These and other issues are brought forward using
two key examples, one of a continuous and one of a categorical nature. We argue
that model assessment ought to consist of two parts: (i) assessment of a
model's fit to the observed data and (ii) assessment of the sensitivity of
inferences to unverifiable assumptions, that is, to how a model described the
unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the asymptotic behavior of the contaminated sample mean
An observation of a cumulative distribution function with finite variance
is said to be contaminated according to the inflated variance model if it has a
large probability of coming from the original target distribution , but a
small probability of coming from a contaminating distribution that has the same
mean and shape as , though a larger variance. It is well known that in the
presence of data contamination, the ordinary sample mean looses many of its
good properties, making it preferable to use more robust estimators. From a
didactical point of view, it is insightful to see to what extent an intuitive
estimator such as the sample mean becomes less favorable in a contaminated
setting. In this paper, we investigate under which conditions the sample mean,
based on a finite number of independent observations of which are
contaminated according to the inflated variance model, is a valid estimator for
the mean of . In particular, we examine to what extent this estimator is
weakly consistent for the mean of and asymptotically normal. As classical
central limit theory is generally inaccurate to cope with the asymptotic
normality in this setting, we invoke more general approximate central limit
theory as developed by Berckmoes, Lowen, and Van Casteren (2013). Our
theoretical results are illustrated by a specific example and a simulation
study.Comment: 14 pages, 1 figur
Discussion of Likelihood Inference for Models with Unobservables: Another View
Discussion of "Likelihood Inference for Models with Unobservables: Another
View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]Comment: Published in at http://dx.doi.org/10.1214/09-STS277A the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The analysis of correlated non-Gaussian outcomes from clusters of size two: non-multilevel-based alternatives?
In this presentation we discuss the analysis of clustered binary or count data, when the cluster size is two. For Gaussian outcomes, linear mixed models taking into account the correlation within clusters, are frequently used and well understood. Here we explore the potential of generalized linear mixed models (GLMMs) for the analysis of non-Gaussian outcomes that are possibly negatively correlated. Several approximation techniques (Gaussian quadrature, Laplace approximation or linearization) that are available in standard software packages for these GLMMs are investigated. Despite the different modelling options related to these different techniques, none of these have satisfactory performance in estimating fixed effects when the within-cluster correlation is negative and/or the number of clusters is relatively small. In contrast, a generalized estimating equations (GEE) approach for the analysis of non-Gaussian data turns out to have an overall excellent performance. When using GEE the robust score and Wald test are recommended for small and large samples, respectively
On the sample mean after a group sequential trial
A popular setting in medical statistics is a group sequential trial with
independent and identically distributed normal outcomes, in which interim
analyses of the sum of the outcomes are performed. Based on a prescribed
stopping rule, one decides after each interim analysis whether the trial is
stopped or continued. Consequently, the actual length of the study is a random
variable. It is reported in the literature that the interim analyses may cause
bias if one uses the ordinary sample mean to estimate the location parameter.
For a generic stopping rule, which contains many classical stopping rules as a
special case, explicit formulas for the expected length of the trial, the bias,
and the mean squared error (MSE) are provided. It is deduced that, for a fixed
number of interim analyses, the bias and the MSE converge to zero if the first
interim analysis is performed not too early. In addition, optimal rates for
this convergence are provided. Furthermore, under a regularity condition,
asymptotic normality in total variation distance for the sample mean is
established. A conclusion for naive confidence intervals based on the sample
mean is derived. It is also shown how the developed theory naturally fits in
the broader framework of likelihood theory in a group sequential trial setting.
A simulation study underpins the theoretical findings.Comment: 52 pages (supplementary data file included
Evaluating Mode Effects in Mixed-Mode Survey Data Using Covariate Adjustment Models
Abstract
The confounding of selection and measurement effects between different modes is a disadvantage of mixed-mode surveys. Solutions to this problem have been suggested in several studies. Most use adjusting covariates to control selection effects. Unfortunately, these covariates must meet strong assumptions, which are generally ignored. This article discusses these assumptions in greater detail and also provides an alternative model for solving the problem. This alternative uses adjusting covariates, explaining measurement effects instead of selection effects. The application of both models is illustrated by using data from a survey on opinions about surveys, which yields mode effects in line with expectations for the latter model, and mode effects contrary to expectations for the former model. However, the validity of these results depends entirely on the (ad hoc) covariates chosen. Research into better covariates might thus be a topic for future studies.</jats:p
Generating Correlated and/or Overdispersed Count Data: A SAS Implementation
Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs, Verbeke, Demétrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output
Comments on: Missing data methods in longitudinal studies: a review
Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children’s obesity study
A goodness-of-fit test for the random-effects distribution in mixed models
In this paper, we develop a simple diagnostic test for the random-effects distribution in mixed models. The test is based on the gradient function, a graphical tool proposed by Verbeke and Molenberghs to check the impact of assumptions about the random-effects distribution in mixed models on inferences. Inference is conducted through the bootstrap. The proposed test is easy to implement and applicable in a general class of mixed models. The operating characteristics of the test are evaluated in a simulation study, and the method is further illustrated using two real data analyses
- …