2,656 research outputs found
On the asymptotic behavior of the contaminated sample mean
An observation of a cumulative distribution function with finite variance
is said to be contaminated according to the inflated variance model if it has a
large probability of coming from the original target distribution , but a
small probability of coming from a contaminating distribution that has the same
mean and shape as , though a larger variance. It is well known that in the
presence of data contamination, the ordinary sample mean looses many of its
good properties, making it preferable to use more robust estimators. From a
didactical point of view, it is insightful to see to what extent an intuitive
estimator such as the sample mean becomes less favorable in a contaminated
setting. In this paper, we investigate under which conditions the sample mean,
based on a finite number of independent observations of which are
contaminated according to the inflated variance model, is a valid estimator for
the mean of . In particular, we examine to what extent this estimator is
weakly consistent for the mean of and asymptotically normal. As classical
central limit theory is generally inaccurate to cope with the asymptotic
normality in this setting, we invoke more general approximate central limit
theory as developed by Berckmoes, Lowen, and Van Casteren (2013). Our
theoretical results are illustrated by a specific example and a simulation
study.Comment: 14 pages, 1 figur
The analysis of correlated non-Gaussian outcomes from clusters of size two: non-multilevel-based alternatives?
In this presentation we discuss the analysis of clustered binary or count data, when the cluster size is two. For Gaussian outcomes, linear mixed models taking into account the correlation within clusters, are frequently used and well understood. Here we explore the potential of generalized linear mixed models (GLMMs) for the analysis of non-Gaussian outcomes that are possibly negatively correlated. Several approximation techniques (Gaussian quadrature, Laplace approximation or linearization) that are available in standard software packages for these GLMMs are investigated. Despite the different modelling options related to these different techniques, none of these have satisfactory performance in estimating fixed effects when the within-cluster correlation is negative and/or the number of clusters is relatively small. In contrast, a generalized estimating equations (GEE) approach for the analysis of non-Gaussian data turns out to have an overall excellent performance. When using GEE the robust score and Wald test are recommended for small and large samples, respectively
Formal and Informal Model Selection with Incomplete Data
Model selection and assessment with incomplete data pose challenges in
addition to the ones encountered with complete data. There are two main reasons
for this. First, many models describe characteristics of the complete data, in
spite of the fact that only an incomplete subset is observed. Direct comparison
between model and data is then less than straightforward. Second, many commonly
used models are more sensitive to assumptions than in the complete-data
situation and some of their properties vanish when they are fitted to
incomplete, unbalanced data. These and other issues are brought forward using
two key examples, one of a continuous and one of a categorical nature. We argue
that model assessment ought to consist of two parts: (i) assessment of a
model's fit to the observed data and (ii) assessment of the sensitivity of
inferences to unverifiable assumptions, that is, to how a model described the
unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the sample mean after a group sequential trial
A popular setting in medical statistics is a group sequential trial with
independent and identically distributed normal outcomes, in which interim
analyses of the sum of the outcomes are performed. Based on a prescribed
stopping rule, one decides after each interim analysis whether the trial is
stopped or continued. Consequently, the actual length of the study is a random
variable. It is reported in the literature that the interim analyses may cause
bias if one uses the ordinary sample mean to estimate the location parameter.
For a generic stopping rule, which contains many classical stopping rules as a
special case, explicit formulas for the expected length of the trial, the bias,
and the mean squared error (MSE) are provided. It is deduced that, for a fixed
number of interim analyses, the bias and the MSE converge to zero if the first
interim analysis is performed not too early. In addition, optimal rates for
this convergence are provided. Furthermore, under a regularity condition,
asymptotic normality in total variation distance for the sample mean is
established. A conclusion for naive confidence intervals based on the sample
mean is derived. It is also shown how the developed theory naturally fits in
the broader framework of likelihood theory in a group sequential trial setting.
A simulation study underpins the theoretical findings.Comment: 52 pages (supplementary data file included
Discussion of Likelihood Inference for Models with Unobservables: Another View
Discussion of "Likelihood Inference for Models with Unobservables: Another
View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]Comment: Published in at http://dx.doi.org/10.1214/09-STS277A the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Estimating Stellar Parameters from Spectra using a Hierarchical Bayesian Approach
A method is developed for fitting theoretically predicted astronomical
spectra to an observed spectrum. Using a hierarchical Bayesian principle, the
method takes both systematic and statistical measurement errors into account,
which has not been done before in the astronomical literature. The goal is to
estimate fundamental stellar parameters and their associated uncertainties. The
non-availability of a convenient deterministic relation between stellar
parameters and the observed spectrum, combined with the computational
complexities this entails, necessitate the curtailment of the continuous
Bayesian model to a reduced model based on a grid of synthetic spectra. A
criterion for model selection based on the so-called predictive squared error
loss function is proposed, together with a measure for the goodness-of-fit
between observed and synthetic spectra. The proposed method is applied to the
infrared 2.38--2.60 \mic ISO-SWS data (Infrared Space Observatory - Short
Wavelength Spectrometer) of the star Bootis, yielding estimates for
the stellar parameters: effective temperature \Teff = 4230 83 K, gravity
g = 1.50 0.15 dex, and metallicity [Fe/H] = dex.Comment: 15 pages, 8 figures, 5 tables. Accepted for publication in MNRA
The neuroscience of intergroup threat and violence
The COVID-19 pandemic led to a global increase in hate crimes and xenophobia. In these uncertain times, real or imaginary threats can easily lead to intergroup conflict. Here, we integrate social neuroscience findings with classic social psychology theories into a framework to better understand how intergroup threat can lead to violence. The role of moral disengagement, dehumanization, and intergroup schadenfreude in this process are discussed, together with their underlying neural mechanisms. We outline how this framework can inform social scientists and policy makers to help reduce the escalation of intergroup conflict and promote intergroup cooperation. The critical role of the media and public figures in these unprecedented times is highlighted as an important factor to achieve these goals
Generating Correlated and/or Overdispersed Count Data: A SAS Implementation
Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs, Verbeke, Demétrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output
Comments on: Missing data methods in longitudinal studies: a review
Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine childrenâs obesity study
- âŠ