Search CORE

238 research outputs found

Pooling multiple imputations when the sample happens to be the population

Author: van Buuren Stef
Vink Gerko
Publication venue
Publication date: 30/09/2014
Field of study

Current pooling rules for multiply imputed data assume infinite populations. In some situations this assumption is not feasible as every unit in the population has been observed, potentially leading to over-covered population estimates. We simplify the existing pooling rules for situations where the sampling variance is not of interest. We compare these rules to the conventional pooling rules and demonstrate their use in a situation where there is no sampling variance. Using the standard pooling rules in situations where sampling variance should not be considered, leads to overestimation of the variance of the estimates of interest, especially when the amount of missingness is not very large. As a result, populations estimates are over-covered, which may lead to a loss of statistical power. We conclude that the theory of multiple imputation can be extended to the situation where the sample happens to be the population. The simplified pooling rules can be easily implemented to obtain valid inference in cases where we have observed essentially all units and in simulation studies addressing the missingness mechanism only.Comment: 6 pages, 1 figure, 1 tabl

arXiv.org e-Print Archive

Utrecht University Repository

Broken Stick Model for Irregular Longitudinal Data

Author: van Buuren Stef
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 23/03/2023
Field of study

Many longitudinal studies collect data that have irregular observation times, often requiring the application of linear mixed models with time-varying outcomes. This paper presents an alternative that splits the quantitative analysis into two steps. The first step converts irregularly observed data into a set of repeated measures through the broken stick model. The second step estimates the parameters of scientific interest from the repeated measurements at the subject level. The broken stick model approximates each subject's trajectory by a series of connected straight lines. The breakpoints, specified by the user, divide the time axis into consecutive intervals common to all subjects. Specification of the model requires just three variables: time, measurement and subject. The model is a special case of the linear mixed model, with time as a linear B-spline and subject as the grouping factor. The main assumptions are: Subjects are exchangeable, trajectories between consecutive breakpoints are straight, random effects follow a multivariate normal distribution, and unobserved data are missing at random. The R package brokenstick v2.5.0 offers tools to calculate, predict, impute and visualize broken stick estimates. The package supports two optimization methods, including options to constrain the variance-covariance matrix of the random effects. We demonstrate six applications of the model: Detection of critical periods, estimation of the time-to-time correlations, profile analysis, curve interpolation, multiple imputation and personalized prediction of future outcomes by curve matching

Journal of Statistical Software

Вклад интеллигенции в исследование социальной памяти

Author: Van Buuren Stef
Van Dommelen Paula
Publication venue: Кримський науковий центр НАН України і МОН України
Publication date: 01/01/2008
Field of study

An important goal of growth monitoring is to identify genetic disorders, diseases or other conditions that manifest themselves through an abnormal growth. The two main conditions that can be detected by height monitoring are Turner's syndrome and growth hormone deficiency. Conditions or risk factors that can be detected by monitoring weight or body mass index include hypernatremic dehydration, celiac disease, cystic fibrosis and obesity. Monitoring infant head growth can be used to detect macrocephaly, developmental disorder and ill health in childhood. This paper describes statistical methods to obtain evidence-based referral criteria in growth monitoring. The referral criteria that we discuss are based on either anthropometric measurement(s) at a fixed age using (1) a Centile or a Standard Deviation Score, (2) a Standard Deviation corrected for parental height, (3) a Likelihood Ratio Statistic and (4) an ellipse, or on multiple measurements over time using (5) a growth rate and (6) a growth curve model. We review the potential uses of these methods, and outline their strengths and limitations

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Utrecht University Repository

Мировой опыт создания и функционирования свободных экономических зон и возможности его использования в Украине

Author: Rom Perenboom
Stef van Buuren
Publication venue: Кримський науковий центр НАН України і МОН України
Publication date: 01/01/2004
Field of study

Сейчас в мире функционирует, по разным данным, от 400 до 2000 свободных экономических зон. Впервые СЭЗ были созданы в США по акту 1934 г. в виде зон внешней торговли. Целью их была активизация внешнеторговой деятельности посредством использования эффективных механизмов снижения таможенных издержек. При этом главным образом предполагалось сокращение импортных тарифов на детали и компоненты для производства автомобилей. В зоны внешней торговли были превращены склады, доки, аэропорты. Предприятия, действующие в указанных зонах, выводились из-под таможенного контроля в США, если импортируемые в зону товары затем направлялись в третью страну. Таможенные издержки снижались и тогда, когда в зоне осуществлялась "доводка" продукции фирм США для последующего экспорта. Если же товары из зоны шли в США, они в обязательном порядке проходили все таможенные процедуры, предусмотренные законодательством страны

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Directory of Open Access Journals

PubMed Central

Utrecht University Repository

mice: Multivariate Imputation by Chained Equations in R

Author: Buuren Stef van
Groothuis-Oudshoorn Karin
Publication venue: American Statistical Association
Publication date: 01/01/2010
Field of study

The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems

CiteSeerX

Directory of Open Access Journals

Journal of Statistical Software

University of Twente Research Information

Utrecht University Repository

Looking back at the Gifi system of nonlinear multivariate analysis

Author: van Buuren Stef
Van Der Heijden Peter G.M.
Publication venue: 'Foundation for Open Access Statistic'
Publication date
Field of study

Gifi was the nom de plume for a group of researchers led by Jan de Leeuw at the University of Leiden. Between 1970 and 1990 the group produced a stream of theoretical papers and computer programs in the area of nonlinear multivariate analysis that were very innovative. In an informal way this paper discusses the so-called Gifi system of nonlinear multivariate analysis, that entails homogeneity analysis (which is closely related to multiple correspondence analysis) and generalizations. The history is discussed, giving attention to the scientific philosophy of this group, and links to machine learning are indicated

Southampton (e-Prints Soton)

Looking Back at the Gifi System of Nonlinear Multivariate Analysis

Author: van Buuren Stef
van der Heijden Peter G. M.
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2016
Field of study

Directory of Open Access Journals

Journal of Statistical Software

Utrecht University Repository

Ідентифікація системи підтримки прийняття рішень з урахуванням ризику на прикладі банківського кредитування

Author: Paula van Dommelen
Remy A HiraSing
Stef van Buuren
Yvonne Schönbeck
Publication venue: Інститут проблем реєстрації інформації НАН України
Publication date: 01/01/2005
Field of study

Розроблено систему підтримки прийняття рішень щодо банківського кредитування для юридичних і фізичних осіб з урахуванням ризику. Запропоновано здійснювати формалізацію системи підтримки прийняття рішень із використанням математичного апарату нечітких множин та розроблено алгоритм прийняття рішень на її основі.The decision-making support system concerning bank crediting for the legal and physical persons in view of venture is developed. It is offered to make formalization of decision-making support system with use of the mathematical apparatus of indistinct sets and the algorithm of decision-making on their basis is developed

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Directory of Open Access Journals

PubMed Central

Utrecht University Repository

FigShare

The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation

Author: Austin Peter C.
van Buuren Stef
Publication venue
Publication date: 01/12/2022
Field of study

Background: Multiple imputation is frequently used to address missing data when conducting statistical analyses. There is a paucity of research into the performance of multiple imputation when the prevalence of missing data is very high. Our objective was to assess the performance of multiple imputation when estimating a logistic regression model when the prevalence of missing data for predictor variables is very high. Methods: Monte Carlo simulations were used to examine the performance of multiple imputation when estimating a multivariable logistic regression model. We varied the size of the analysis samples (N = 500, 1,000, 5,000, 10,000, and 25,000) and the prevalence of missing data (5–95% in increments of 5%). Results: In general, multiple imputation performed well across the range of scenarios. The exceptions were in scenarios when the sample size was 500 or 1,000 and the prevalence of missing data was at least 90%. In these scenarios, the estimated standard errors of the log-odds ratios were very large and did not accurately estimate the standard deviation of the sampling distribution of the log-odds ratio. Furthermore, in these settings, estimated confidence intervals tended to be conservative. In all other settings (i.e., sample sizes > 1,000 or when the prevalence of missing data was less than 90%), then multiple imputation allowed for accurate estimation of a logistic regression model. Conclusions: Multiple imputation can be used in many scenarios with a very high prevalence of missing data

Utrecht University Repository

How to relate potential outcomes: Estimating individual treatment effects under a given specified partial correlation

Author: Cai Mingyang
van Buuren Stef
Vink Gerko
Publication venue: 'Center for Open Science'
Publication date: 01/01/2022
Field of study

In most medical research, the average treatment effect is used to evaluate a treatment’s performance. However, precision medicine requires knowledge of individual treatment effects: What is the difference between a unit’s measurement under treatment and control conditions? In most treatment effect studies, such answers are not possible because the outcomes under both experimental conditions are not jointly observed. This makes the problem of causal inference a missing data problem. We propose to solve this problem by

Utrecht University Repository