246 research outputs found
Pooling multiple imputations when the sample happens to be the population
Current pooling rules for multiply imputed data assume infinite populations.
In some situations this assumption is not feasible as every unit in the
population has been observed, potentially leading to over-covered population
estimates. We simplify the existing pooling rules for situations where the
sampling variance is not of interest. We compare these rules to the
conventional pooling rules and demonstrate their use in a situation where there
is no sampling variance. Using the standard pooling rules in situations where
sampling variance should not be considered, leads to overestimation of the
variance of the estimates of interest, especially when the amount of
missingness is not very large. As a result, populations estimates are
over-covered, which may lead to a loss of statistical power. We conclude that
the theory of multiple imputation can be extended to the situation where the
sample happens to be the population. The simplified pooling rules can be easily
implemented to obtain valid inference in cases where we have observed
essentially all units and in simulation studies addressing the missingness
mechanism only.Comment: 6 pages, 1 figure, 1 tabl
Broken Stick Model for Irregular Longitudinal Data
Many longitudinal studies collect data that have irregular observation times, often requiring the application of linear mixed models with time-varying outcomes. This paper presents an alternative that splits the quantitative analysis into two steps. The first step converts irregularly observed data into a set of repeated measures through the broken stick model. The second step estimates the parameters of scientific interest from the repeated measurements at the subject level. The broken stick model approximates each subject's trajectory by a series of connected straight lines. The breakpoints, specified by the user, divide the time axis into consecutive intervals common to all subjects. Specification of the model requires just three variables: time, measurement and subject. The model is a special case of the linear mixed model, with time as a linear B-spline and subject as the grouping factor. The main assumptions are: Subjects are exchangeable, trajectories between consecutive breakpoints are straight, random effects follow a multivariate normal distribution, and unobserved data are missing at random. The R package brokenstick v2.5.0 offers tools to calculate, predict, impute and visualize broken stick estimates. The package supports two optimization methods, including options to constrain the variance-covariance matrix of the random effects. We demonstrate six applications of the model: Detection of critical periods, estimation of the time-to-time correlations, profile analysis, curve interpolation, multiple imputation and personalized prediction of future outcomes by curve matching
ΠΠΊΠ»Π°Π΄ ΠΈΠ½ΡΠ΅Π»Π»ΠΈΠ³Π΅Π½ΡΠΈΠΈ Π² ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ ΡΠΎΡΠΈΠ°Π»ΡΠ½ΠΎΠΉ ΠΏΠ°ΠΌΡΡΠΈ
An important goal of growth monitoring is to identify genetic disorders, diseases or other conditions that manifest themselves through an abnormal growth. The two main conditions that can be detected by height monitoring are Turner's syndrome and growth hormone deficiency. Conditions or risk factors that can be detected by monitoring weight or body mass index include hypernatremic dehydration, celiac disease, cystic fibrosis and obesity. Monitoring infant head growth can be used to detect macrocephaly, developmental disorder and ill health in childhood. This paper describes statistical methods to obtain evidence-based referral criteria in growth monitoring. The referral criteria that we discuss are based on either anthropometric measurement(s) at a fixed age using (1) a Centile or a Standard Deviation Score, (2) a Standard Deviation corrected for parental height, (3) a Likelihood Ratio Statistic and (4) an ellipse, or on multiple measurements over time using (5) a growth rate and (6) a growth curve model. We review the potential uses of these methods, and outline their strengths and limitations
ΠΠΈΡΠΎΠ²ΠΎΠΉ ΠΎΠΏΡΡ ΡΠΎΠ·Π΄Π°Π½ΠΈΡ ΠΈ ΡΡΠ½ΠΊΡΠΈΠΎΠ½ΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΡΡ ΡΠΊΠΎΠ½ΠΎΠΌΠΈΡΠ΅ΡΠΊΠΈΡ Π·ΠΎΠ½ ΠΈ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡΠΈ Π΅Π³ΠΎ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ Π² Π£ΠΊΡΠ°ΠΈΠ½Π΅
Π‘Π΅ΠΉΡΠ°Ρ Π² ΠΌΠΈΡΠ΅ ΡΡΠ½ΠΊΡΠΈΠΎΠ½ΠΈΡΡΠ΅Ρ, ΠΏΠΎ ΡΠ°Π·Π½ΡΠΌ Π΄Π°Π½Π½ΡΠΌ, ΠΎΡ 400 Π΄ΠΎ 2000 ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΡΡ
ΡΠΊΠΎΠ½ΠΎΠΌΠΈΡΠ΅ΡΠΊΠΈΡ
Π·ΠΎΠ½. ΠΠΏΠ΅ΡΠ²ΡΠ΅ Π‘ΠΠ Π±ΡΠ»ΠΈ ΡΠΎΠ·Π΄Π°Π½Ρ Π² Π‘Π¨Π ΠΏΠΎ Π°ΠΊΡΡ 1934 Π³. Π² Π²ΠΈΠ΄Π΅ Π·ΠΎΠ½ Π²Π½Π΅ΡΠ½Π΅ΠΉ ΡΠΎΡΠ³ΠΎΠ²Π»ΠΈ. Π¦Π΅Π»ΡΡ ΠΈΡ
Π±ΡΠ»Π° Π°ΠΊΡΠΈΠ²ΠΈΠ·Π°ΡΠΈΡ Π²Π½Π΅ΡΠ½Π΅ΡΠΎΡΠ³ΠΎΠ²ΠΎΠΉ Π΄Π΅ΡΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ ΠΏΠΎΡΡΠ΅Π΄ΡΡΠ²ΠΎΠΌ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΠΌΠ΅Ρ
Π°Π½ΠΈΠ·ΠΌΠΎΠ² ΡΠ½ΠΈΠΆΠ΅Π½ΠΈΡ ΡΠ°ΠΌΠΎΠΆΠ΅Π½Π½ΡΡ
ΠΈΠ·Π΄Π΅ΡΠΆΠ΅ΠΊ. ΠΡΠΈ ΡΡΠΎΠΌ Π³Π»Π°Π²Π½ΡΠΌ ΠΎΠ±ΡΠ°Π·ΠΎΠΌ ΠΏΡΠ΅Π΄ΠΏΠΎΠ»Π°Π³Π°Π»ΠΎΡΡ ΡΠΎΠΊΡΠ°ΡΠ΅Π½ΠΈΠ΅ ΠΈΠΌΠΏΠΎΡΡΠ½ΡΡ
ΡΠ°ΡΠΈΡΠΎΠ² Π½Π° Π΄Π΅ΡΠ°Π»ΠΈ ΠΈ ΠΊΠΎΠΌΠΏΠΎΠ½Π΅Π½ΡΡ Π΄Π»Ρ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΡΡΠ²Π° Π°Π²ΡΠΎΠΌΠΎΠ±ΠΈΠ»Π΅ΠΉ. Π Π·ΠΎΠ½Ρ Π²Π½Π΅ΡΠ½Π΅ΠΉ ΡΠΎΡΠ³ΠΎΠ²Π»ΠΈ Π±ΡΠ»ΠΈ ΠΏΡΠ΅Π²ΡΠ°ΡΠ΅Π½Ρ ΡΠΊΠ»Π°Π΄Ρ, Π΄ΠΎΠΊΠΈ, Π°ΡΡΠΎΠΏΠΎΡΡΡ. ΠΡΠ΅Π΄ΠΏΡΠΈΡΡΠΈΡ, Π΄Π΅ΠΉΡΡΠ²ΡΡΡΠΈΠ΅ Π² ΡΠΊΠ°Π·Π°Π½Π½ΡΡ
Π·ΠΎΠ½Π°Ρ
, Π²ΡΠ²ΠΎΠ΄ΠΈΠ»ΠΈΡΡ ΠΈΠ·-ΠΏΠΎΠ΄ ΡΠ°ΠΌΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ ΠΊΠΎΠ½ΡΡΠΎΠ»Ρ Π² Π‘Π¨Π, Π΅ΡΠ»ΠΈ ΠΈΠΌΠΏΠΎΡΡΠΈΡΡΠ΅ΠΌΡΠ΅ Π² Π·ΠΎΠ½Ρ ΡΠΎΠ²Π°ΡΡ Π·Π°ΡΠ΅ΠΌ Π½Π°ΠΏΡΠ°Π²Π»ΡΠ»ΠΈΡΡ Π² ΡΡΠ΅ΡΡΡ ΡΡΡΠ°Π½Ρ. Π’Π°ΠΌΠΎΠΆΠ΅Π½Π½ΡΠ΅ ΠΈΠ·Π΄Π΅ΡΠΆΠΊΠΈ ΡΠ½ΠΈΠΆΠ°Π»ΠΈΡΡ ΠΈ ΡΠΎΠ³Π΄Π°, ΠΊΠΎΠ³Π΄Π° Π² Π·ΠΎΠ½Π΅ ΠΎΡΡΡΠ΅ΡΡΠ²Π»ΡΠ»Π°ΡΡ "Π΄ΠΎΠ²ΠΎΠ΄ΠΊΠ°" ΠΏΡΠΎΠ΄ΡΠΊΡΠΈΠΈ ΡΠΈΡΠΌ Π‘Π¨Π Π΄Π»Ρ ΠΏΠΎΡΠ»Π΅Π΄ΡΡΡΠ΅Π³ΠΎ ΡΠΊΡΠΏΠΎΡΡΠ°. ΠΡΠ»ΠΈ ΠΆΠ΅ ΡΠΎΠ²Π°ΡΡ ΠΈΠ· Π·ΠΎΠ½Ρ ΡΠ»ΠΈ Π² Π‘Π¨Π, ΠΎΠ½ΠΈ Π² ΠΎΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎΠΌ ΠΏΠΎΡΡΠ΄ΠΊΠ΅ ΠΏΡΠΎΡ
ΠΎΠ΄ΠΈΠ»ΠΈ Π²ΡΠ΅ ΡΠ°ΠΌΠΎΠΆΠ΅Π½Π½ΡΠ΅ ΠΏΡΠΎΡΠ΅Π΄ΡΡΡ, ΠΏΡΠ΅Π΄ΡΡΠΌΠΎΡΡΠ΅Π½Π½ΡΠ΅ Π·Π°ΠΊΠΎΠ½ΠΎΠ΄Π°ΡΠ΅Π»ΡΡΡΠ²ΠΎΠΌ ΡΡΡΠ°Π½Ρ
mice: Multivariate Imputation by Chained Equations in R
The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems
Evaluation and prediction of individual growth trajectories
Background: Conventional growth charts offer limited guidance to track individual growth. Aim: To explore new approaches to improve the evaluation and prediction of individual growth trajectories. Subjects and methods: We generalise the conditional SDS gain to multiple historical measurements, using the Cole correlation model to find correlations at exact ages, the sweep operator to find regression weights and a specified longitudinal reference. We explain the various steps of the methodology and validate and demonstrate the method using empirical data from the SMOCC study with 1985 children measured during ten visits at ages 0β2years. Results: The method performs according to statistical theory. We apply the method to estimate the referral rates for a given screening policy. We visualise the childβs trajectory as an adaptive growth chart featuring two new graphical elements: amplitude (for evaluation) and flag (for prediction). The relevant calculations take about 1 millisecond per child. Conclusion: Longitudinal references capture the dynamic nature of child growth. The adaptive growth chart for individual monitoring works with exact ages, corrects for regression to the mean, has a known distribution at any pair of ages and is fast. We recommend the method for evaluating and predicting individual child growth
Looking back at the Gifi system of nonlinear multivariate analysis
Gifi was the nom de plume for a group of researchers led by Jan de Leeuw at the University of Leiden. Between 1970 and 1990 the group produced a stream of theoretical papers and computer programs in the area of nonlinear multivariate analysis that were very innovative. In an informal way this paper discusses the so-called Gifi system of nonlinear multivariate analysis, that entails homogeneity analysis (which is closely related to multiple correspondence analysis) and generalizations. The history is discussed, giving attention to the scientific philosophy of this group, and links to machine learning are indicated
Looking Back at the Gifi System of Nonlinear Multivariate Analysis
Gifi was the nom de plume for a group of researchers led by Jan de Leeuw at the University of Leiden. Between 1970 and 1990 the group produced a stream of theoretical papers and computer programs in the area of nonlinear multivariate analysis that were very innovative. In an informal way this paper discusses the so-called Gifi system of nonlinear multivariate analysis, that entails homogeneity analysis (which is closely related to multiple correspondence analysis) and generalizations. The history is discussed, giving attention to the scientific philosophy of this group, and links to machine learning are indicated
- β¦