968 research outputs found
Measuring Inequality Using Censored Data: A Multiple Imputation Approach
To measure income inequality with right censored (topcoded) data, we propose multiple imputation for censored observations using draws from Generalized Beta of the Second Kind distributions to provide partially synthetic datasets analyzed using complete data methods. Estimation and inference uses Reiter's (Survey Methodology 2003) formulae. Using Current Population Survey (CPS) internal data, we find few statistically significant differences in income inequality for pairs of years between 1995 and 2004. We also show that using CPS public use data with cell mean imputations may lead to incorrect inferences about inequality differences. Multiply-imputed public use data provide an intermediate solution.Income inequality, topcoding, partially synthetic data, CPS, current population survey, generalized beta of the second kind distribution
Measuring Inequality Using Censored Data: A Multiple Imputation Approach
To measure income inequality with right censored (topcoded) data, we propose multiple imputation for censored observations using draws from Generalized Beta of the Second Kind distributions to provide partially synthetic datasets analyzed using complete data methods. Estimation and inference uses Reiter’s (Survey Methodology 2003) formulae. Using Current Population Survey (CPS) internal data, we find few statistically significant differences in income inequality for pairs of years between 1995 and 2004. We also show that using CPS public use data with cell mean imputations may lead to incorrect inferences about inequality differences. Multiply-imputed public use data provide an intermediate solution.income inequality, topcoding, partially synthetic data, CPS, Current Population Survey, Generalized Beta of the Second Kind distribution
Measuring inequality using Censored data: A multiple imputation approach
To measure income inequality with right censored (topcoded) data, we propose multiple imputation for censored observations using draws from Generalized Beta of the Second Kind distributions to provide partially synthetic datasets analyzed using complete data methods. Estimation and inference uses Reiter’s (Survey Methodology 2003) formulae. Using Current Population Survey (CPS) internal data, we find few statistically significant differences in income inequality for pairs of years between 1995 and 2004. We also show that using CPS public use data with cell mean imputations may lead to incorrect inferences about inequality differences. Multiply-imputed public use data provide an intermediate solution.Income Inequality, Topcoding, Partially Synthetic Data, CPS, Current Population Survey, Generalized Beta of the Second Kind distribution
Missing.... presumed at random: cost-analysis of incomplete data
When collecting patient-level resource use data for statistical analysis, for some patients and in some categories of resource use, the required count will not be observed. Although this problem must arise in most reported economic evaluations containing patient-level data, it is rare for authors to detail how the problem was overcome. Statistical packages may default to handling missing data through a so-called complete case analysis, while some recent cost-analyses have appeared to favour an available case approach. Both of these methods are problematic: complete case analysis is inefficient and is likely to be biased; available case analysis, by employing different numbers of observations for each resource use item, generates severe problems for standard statistical inference. Instead we explore imputation methods for generating replacement values for missing data that will permit complete case analysis using the whole data set and we illustrate these methods using two data sets that had incomplete resource use information
Missing Data in the Context of Student Growth
One property of student growth data that is often overlooked despite widespread prevalence is incomplete or missing observations. As students migrate in and out of school districts, opt out of standardized testing, or are absent on test days, there are many reasons student records are fractured. Missing data in growth models can bias model estimates and growth inferences. This study presents empirical explorations of how well missing data methodologies recover attributes of would-be complete student data used for teacher evaluation. Missing data methods are compared in the context of a Student Growth Percentiles (SGP) model used by several school systems for accountability purposes. Using a real longitudinal dataset, we evaluate the sensitivity of growth estimates to missing data and compare the following missing data methods: listwise deletion, likelihood-based imputation using an expectation-maximization algorithm, multiple imputation using a Markov Chain Monte Carlo method, multiple imputation using a predictive mean matching method, and inverse probability weighting. Methodological and practical consequences of missing data are discussed
Modeling longitudinal data with interval censored anchoring events
Indiana University-Purdue University Indianapolis (IUPUI)In many longitudinal studies, the time scales upon which we assess the primary outcomes
are anchored by pre-specified events. However, these anchoring events are
often not observable and they are randomly distributed with unknown distribution.
Without direct observations of the anchoring events, the time scale used for analysis
are not available, and analysts will not be able to use the traditional longitudinal
models to describe the temporal changes as desired. Existing methods often make
either ad hoc or strong assumptions on the anchoring events, which are unveri able
and prone to biased estimation and invalid inference.
Although not able to directly observe, researchers can often ascertain an interval
that includes the unobserved anchoring events, i.e., the anchoring events are
interval censored. In this research, we proposed a two-stage method to fit commonly
used longitudinal models with interval censored anchoring events. In the first stage,
we obtain an estimate of the anchoring events distribution by nonparametric method
using the interval censored data; in the second stage, we obtain the parameter estimates
as stochastic functionals of the estimated distribution. The construction of the
stochastic functional depends on model settings. In this research, we considered two
types of models. The first model was a distribution-free model, in which no parametric
assumption was made on the distribution of the error term. The second model was
likelihood based, which extended the classic mixed-effects models to the situation that the origin of the time scale for analysis was interval censored. For the purpose
of large-sample statistical inference in both models, we studied the asymptotic
properties of the proposed functional estimator using empirical process theory. Theoretically,
our method provided a general approach to study semiparametric maximum
pseudo-likelihood estimators in similar data situations. Finite sample performance of
the proposed method were examined through simulation study. Algorithmically eff-
cient algorithms for computing the parameter estimates were provided. We applied
the proposed method to a real data analysis and obtained new findings that were
incapable using traditional mixed-effects models.2 year
Network dynamical stability analysis of homeostasis reveals "mallostasis": biological equilibria drifting towards worsening health with age
Using longitudinal study data, we dynamically model how aging affects
homeostasis in both mice and humans. We operationalize homeostasis as a
multivariate mean-reverting stochastic process. Our central hypothesis is that
homeostasis causes biomarkers to have stable equilibrium values, but that
deviations from equilibrium of one biomarker can affect other biomarkers
through an interaction network. These interactions preclude analysis of one
biomarker at a time. We therefore looked for age-related changes to homeostasis
using dynamic network stability analysis (eigen-analysis), which transforms
observed biomarker data into independent "natural" variables and determines
their associated recovery rates. Most natural variables remained near
equilibrium and were essentially constant in time. Some natural variables were
unable to equilibrate due to a gradual drift with age in their homeostatic
equilibrium, i.e. allostasis. This drift caused them to accumulate over the
lifespan course. These accumulating variables are natural aging variables.
Their rate of accumulation was correlated with risk of adverse outcomes: death
or dementia onset. We call this tendency for aging organisms to drift towards
an equilibrium position of ever-worsening health "mallostasis". We demonstrate
that the effects of mallostasis on observed biomarkers are spread out through
the interaction network. This could provide a redundancy mechanism to preserve
functioning until multi-system dysfunction emerges at advanced ages.Comment: 11 pages and 5 figures + supplemental (30 pages, 2 tables and 17
figures
- …