2,656 research outputs found

    On the asymptotic behavior of the contaminated sample mean

    Full text link
    An observation of a cumulative distribution function FF with finite variance is said to be contaminated according to the inflated variance model if it has a large probability of coming from the original target distribution FF, but a small probability of coming from a contaminating distribution that has the same mean and shape as FF, though a larger variance. It is well known that in the presence of data contamination, the ordinary sample mean looses many of its good properties, making it preferable to use more robust estimators. From a didactical point of view, it is insightful to see to what extent an intuitive estimator such as the sample mean becomes less favorable in a contaminated setting. In this paper, we investigate under which conditions the sample mean, based on a finite number of independent observations of FF which are contaminated according to the inflated variance model, is a valid estimator for the mean of FF. In particular, we examine to what extent this estimator is weakly consistent for the mean of FF and asymptotically normal. As classical central limit theory is generally inaccurate to cope with the asymptotic normality in this setting, we invoke more general approximate central limit theory as developed by Berckmoes, Lowen, and Van Casteren (2013). Our theoretical results are illustrated by a specific example and a simulation study.Comment: 14 pages, 1 figur

    The analysis of correlated non-Gaussian outcomes from clusters of size two: non-multilevel-based alternatives?

    Get PDF
    In this presentation we discuss the analysis of clustered binary or count data, when the cluster size is two. For Gaussian outcomes, linear mixed models taking into account the correlation within clusters, are frequently used and well understood. Here we explore the potential of generalized linear mixed models (GLMMs) for the analysis of non-Gaussian outcomes that are possibly negatively correlated. Several approximation techniques (Gaussian quadrature, Laplace approximation or linearization) that are available in standard software packages for these GLMMs are investigated. Despite the different modelling options related to these different techniques, none of these have satisfactory performance in estimating fixed effects when the within-cluster correlation is negative and/or the number of clusters is relatively small. In contrast, a generalized estimating equations (GEE) approach for the analysis of non-Gaussian data turns out to have an overall excellent performance. When using GEE the robust score and Wald test are recommended for small and large samples, respectively

    Formal and Informal Model Selection with Incomplete Data

    Full text link
    Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the sample mean after a group sequential trial

    Full text link
    A popular setting in medical statistics is a group sequential trial with independent and identically distributed normal outcomes, in which interim analyses of the sum of the outcomes are performed. Based on a prescribed stopping rule, one decides after each interim analysis whether the trial is stopped or continued. Consequently, the actual length of the study is a random variable. It is reported in the literature that the interim analyses may cause bias if one uses the ordinary sample mean to estimate the location parameter. For a generic stopping rule, which contains many classical stopping rules as a special case, explicit formulas for the expected length of the trial, the bias, and the mean squared error (MSE) are provided. It is deduced that, for a fixed number of interim analyses, the bias and the MSE converge to zero if the first interim analysis is performed not too early. In addition, optimal rates for this convergence are provided. Furthermore, under a regularity condition, asymptotic normality in total variation distance for the sample mean is established. A conclusion for naive confidence intervals based on the sample mean is derived. It is also shown how the developed theory naturally fits in the broader framework of likelihood theory in a group sequential trial setting. A simulation study underpins the theoretical findings.Comment: 52 pages (supplementary data file included

    Discussion of Likelihood Inference for Models with Unobservables: Another View

    Full text link
    Discussion of "Likelihood Inference for Models with Unobservables: Another View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]Comment: Published in at http://dx.doi.org/10.1214/09-STS277A the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating Stellar Parameters from Spectra using a Hierarchical Bayesian Approach

    Get PDF
    A method is developed for fitting theoretically predicted astronomical spectra to an observed spectrum. Using a hierarchical Bayesian principle, the method takes both systematic and statistical measurement errors into account, which has not been done before in the astronomical literature. The goal is to estimate fundamental stellar parameters and their associated uncertainties. The non-availability of a convenient deterministic relation between stellar parameters and the observed spectrum, combined with the computational complexities this entails, necessitate the curtailment of the continuous Bayesian model to a reduced model based on a grid of synthetic spectra. A criterion for model selection based on the so-called predictive squared error loss function is proposed, together with a measure for the goodness-of-fit between observed and synthetic spectra. The proposed method is applied to the infrared 2.38--2.60 \mic ISO-SWS data (Infrared Space Observatory - Short Wavelength Spectrometer) of the star α\alpha Bootis, yielding estimates for the stellar parameters: effective temperature \Teff = 4230 ±\pm 83 K, gravity log⁥\log g = 1.50 ±\pm 0.15 dex, and metallicity [Fe/H] = −0.30±0.21-0.30 \pm 0.21 dex.Comment: 15 pages, 8 figures, 5 tables. Accepted for publication in MNRA

    The neuroscience of intergroup threat and violence

    Get PDF
    The COVID-19 pandemic led to a global increase in hate crimes and xenophobia. In these uncertain times, real or imaginary threats can easily lead to intergroup conflict. Here, we integrate social neuroscience findings with classic social psychology theories into a framework to better understand how intergroup threat can lead to violence. The role of moral disengagement, dehumanization, and intergroup schadenfreude in this process are discussed, together with their underlying neural mechanisms. We outline how this framework can inform social scientists and policy makers to help reduce the escalation of intergroup conflict and promote intergroup cooperation. The critical role of the media and public figures in these unprecedented times is highlighted as an important factor to achieve these goals

    Generating Correlated and/or Overdispersed Count Data: A SAS Implementation

    Get PDF
    Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs, Verbeke, Demétrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output

    Comments on: Missing data methods in longitudinal studies: a review

    Get PDF
    Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children’s obesity study
    • 

    corecore