Search CORE

2,656 research outputs found

On the asymptotic behavior of the contaminated sample mean

Author: Berckmoes Ben
Molenberghs Geert
Publication venue
Publication date: 10/12/2017
Field of study

An observation of a cumulative distribution function

F

with finite variance is said to be contaminated according to the inflated variance model if it has a large probability of coming from the original target distribution

F

, but a small probability of coming from a contaminating distribution that has the same mean and shape as

F

, though a larger variance. It is well known that in the presence of data contamination, the ordinary sample mean looses many of its good properties, making it preferable to use more robust estimators. From a didactical point of view, it is insightful to see to what extent an intuitive estimator such as the sample mean becomes less favorable in a contaminated setting. In this paper, we investigate under which conditions the sample mean, based on a finite number of independent observations of

F

which are contaminated according to the inflated variance model, is a valid estimator for the mean of

F

. In particular, we examine to what extent this estimator is weakly consistent for the mean of

F

and asymptotically normal. As classical central limit theory is generally inaccurate to cope with the asymptotic normality in this setting, we invoke more general approximate central limit theory as developed by Berckmoes, Lowen, and Van Casteren (2013). Our theoretical results are illustrated by a specific example and a simulation study.Comment: 14 pages, 1 figur

arXiv.org e-Print Archive

Lirias

Institutional Repository Universiteit Antwerpen

The analysis of correlated non-Gaussian outcomes from clusters of size two: non-multilevel-based alternatives?

Author: Loeys Tom
Molenberghs Geert
Publication venue
Publication date: 01/01/2012
Field of study

In this presentation we discuss the analysis of clustered binary or count data, when the cluster size is two. For Gaussian outcomes, linear mixed models taking into account the correlation within clusters, are frequently used and well understood. Here we explore the potential of generalized linear mixed models (GLMMs) for the analysis of non-Gaussian outcomes that are possibly negatively correlated. Several approximation techniques (Gaussian quadrature, Laplace approximation or linearization) that are available in standard software packages for these GLMMs are investigated. Despite the different modelling options related to these different techniques, none of these have satisfactory performance in estimating fixed effects when the within-cluster correlation is negative and/or the number of clusters is relatively small. In contrast, a generalized estimating equations (GEE) approach for the analysis of non-Gaussian data turns out to have an overall excellent performance. When using GEE the robust score and Wald test are recommended for small and large samples, respectively

Ghent University Academic Bibliography

Formal and Informal Model Selection with Incomplete Data

Author: Beunckens Caroline
Molenberghs Geert
Verbeke Geert
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 27/08/2008
Field of study

Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Lirias

Crossref

On the sample mean after a group sequential trial

Author: Berckmoes Ben
Ivanova Anna
Molenberghs Geert
Publication venue
Publication date: 01/01/2018
Field of study

A popular setting in medical statistics is a group sequential trial with independent and identically distributed normal outcomes, in which interim analyses of the sum of the outcomes are performed. Based on a prescribed stopping rule, one decides after each interim analysis whether the trial is stopped or continued. Consequently, the actual length of the study is a random variable. It is reported in the literature that the interim analyses may cause bias if one uses the ordinary sample mean to estimate the location parameter. For a generic stopping rule, which contains many classical stopping rules as a special case, explicit formulas for the expected length of the trial, the bias, and the mean squared error (MSE) are provided. It is deduced that, for a fixed number of interim analyses, the bias and the MSE converge to zero if the first interim analysis is performed not too early. In addition, optimal rates for this convergence are provided. Furthermore, under a regularity condition, asymptotic normality in total variation distance for the sample mean is established. A conclusion for naive confidence intervals based on the sample mean is derived. It is also shown how the developed theory naturally fits in the broader framework of likelihood theory in a group sequential trial setting. A simulation study underpins the theoretical findings.Comment: 52 pages (supplementary data file included

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Discussion of Likelihood Inference for Models with Unobservables: Another View

Author: Kenward Michael G.
Molenberghs Geert
Verbeke Geert
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

Discussion of "Likelihood Inference for Models with Unobservables: Another View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]Comment: Published in at http://dx.doi.org/10.1214/09-STS277A the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Lirias

Crossref

LSHTM Research Online

Estimating Stellar Parameters from Spectra using a Hierarchical Bayesian Approach

Author: C. Aerts
G. Molenberghs
GELFAND
L. Decin
Z. Shkedy
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

A method is developed for fitting theoretically predicted astronomical spectra to an observed spectrum. Using a hierarchical Bayesian principle, the method takes both systematic and statistical measurement errors into account, which has not been done before in the astronomical literature. The goal is to estimate fundamental stellar parameters and their associated uncertainties. The non-availability of a convenient deterministic relation between stellar parameters and the observed spectrum, combined with the computational complexities this entails, necessitate the curtailment of the continuous Bayesian model to a reduced model based on a grid of synthetic spectra. A criterion for model selection based on the so-called predictive squared error loss function is proposed, together with a measure for the goodness-of-fit between observed and synthetic spectra. The proposed method is applied to the infrared 2.38--2.60 \mic ISO-SWS data (Infrared Space Observatory - Short Wavelength Spectrometer) of the star

\alpha

Bootis, yielding estimates for the stellar parameters: effective temperature \Teff = 4230

\pm

83 K, gravity

\log

g = 1.50

\pm

0.15 dex, and metallicity [Fe/H] =

-0.30 \pm 0.21

dex.Comment: 15 pages, 8 figures, 5 tables. Accepted for publication in MNRA

arXiv.org e-Print Archive

The neuroscience of intergroup threat and violence

Author: Lantos D
Molenberghs P
Publication venue: 'Elsevier BV'
Publication date: 15/09/2021
Field of study

The COVID-19 pandemic led to a global increase in hate crimes and xenophobia. In these uncertain times, real or imaginary threats can easily lead to intergroup conflict. Here, we integrate social neuroscience findings with classic social psychology theories into a framework to better understand how intergroup threat can lead to violence. The role of moral disengagement, dehumanization, and intergroup schadenfreude in this process are discussed, together with their underlying neural mechanisms. We outline how this framework can inform social scientists and policy makers to help reduce the escalation of intergroup conflict and promote intergroup cooperation. The critical role of the media and public figures in these unprecedented times is highlighted as an important factor to achieve these goals

UCL Discovery

PubMed Central

Generating Correlated and/or Overdispersed Count Data: A SAS Implementation

Author: Kalema George
Molenberghs Geert
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/04/2016
Field of study

Analysis of longitudinal count data has, for long, been done using a generalized linear mixed model (GLMM), in its Poisson-normal version, to account for correlation by specifying normal random effects. Univariate counts are often handled with the negativebinomial (NEGBIN) model taking into account overdispersion by use of gamma random effects. Inherently though, longitudinal count data commonly exhibit both features of correlation and overdispersion simultaneously, necessitating analysis methodology that can account for both. The introduction of the combined model (CM) by Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs, Verbeke, Demétrio, and Vieira (2010) serves this purpose, not only for count data but for the general exponential family of distributions. Here, a Poisson model is specified as the parent distribution of the data with a normally distributed random effect at the subject or cluster level and/or a gamma distribution at observation level. The GLMM and NEGBIN model are special cases. Data can be simulated from (1) the general CM, with random effects, or, (2) its marginal version directly. This paper discusses an implementation of (1) in SAS software (SAS Inc. 2011). One needs to reflect on the mean of both the combined (hierarchical) and marginal models in order to generate correlated and/or overdispersed counts. A pre-specification of the desired marginal mean (in terms of covariates and marginal parameters), a marginal variance-covariance structure and the hierarchical mean (in terms of covariates and regression parameters) is required. The implied hierarchical parameters, the variance-covariance matrix of the random effects, and the variance-covariance matrix of the overdispersion part are then derived from which correlated Poisson data are generated. Sample calls of the SAS macro are presented as well as output

Directory of Open Access Journals

Journal of Statistical Software

Comments on: Missing data methods in longitudinal studies: a review

Author: Ibrahim Joseph
Molenberghs Geert
Publication venue
Publication date: 01/01/2009
Field of study

Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children’s obesity study

Carolina Digital Repository