968 research outputs found

    Measuring Inequality Using Censored Data: A Multiple Imputation Approach

    Get PDF
    To measure income inequality with right censored (topcoded) data, we propose multiple imputation for censored observations using draws from Generalized Beta of the Second Kind distributions to provide partially synthetic datasets analyzed using complete data methods. Estimation and inference uses Reiter's (Survey Methodology 2003) formulae. Using Current Population Survey (CPS) internal data, we find few statistically significant differences in income inequality for pairs of years between 1995 and 2004. We also show that using CPS public use data with cell mean imputations may lead to incorrect inferences about inequality differences. Multiply-imputed public use data provide an intermediate solution.Income inequality, topcoding, partially synthetic data, CPS, current population survey, generalized beta of the second kind distribution

    Measuring Inequality Using Censored Data: A Multiple Imputation Approach

    Get PDF
    To measure income inequality with right censored (topcoded) data, we propose multiple imputation for censored observations using draws from Generalized Beta of the Second Kind distributions to provide partially synthetic datasets analyzed using complete data methods. Estimation and inference uses Reiter’s (Survey Methodology 2003) formulae. Using Current Population Survey (CPS) internal data, we find few statistically significant differences in income inequality for pairs of years between 1995 and 2004. We also show that using CPS public use data with cell mean imputations may lead to incorrect inferences about inequality differences. Multiply-imputed public use data provide an intermediate solution.income inequality, topcoding, partially synthetic data, CPS, Current Population Survey, Generalized Beta of the Second Kind distribution

    Measuring inequality using Censored data: A multiple imputation approach

    Get PDF
    To measure income inequality with right censored (topcoded) data, we propose multiple imputation for censored observations using draws from Generalized Beta of the Second Kind distributions to provide partially synthetic datasets analyzed using complete data methods. Estimation and inference uses Reiter’s (Survey Methodology 2003) formulae. Using Current Population Survey (CPS) internal data, we find few statistically significant differences in income inequality for pairs of years between 1995 and 2004. We also show that using CPS public use data with cell mean imputations may lead to incorrect inferences about inequality differences. Multiply-imputed public use data provide an intermediate solution.Income Inequality, Topcoding, Partially Synthetic Data, CPS, Current Population Survey, Generalized Beta of the Second Kind distribution

    Missing.... presumed at random: cost-analysis of incomplete data

    Get PDF
    When collecting patient-level resource use data for statistical analysis, for some patients and in some categories of resource use, the required count will not be observed. Although this problem must arise in most reported economic evaluations containing patient-level data, it is rare for authors to detail how the problem was overcome. Statistical packages may default to handling missing data through a so-called complete case analysis, while some recent cost-analyses have appeared to favour an available case approach. Both of these methods are problematic: complete case analysis is inefficient and is likely to be biased; available case analysis, by employing different numbers of observations for each resource use item, generates severe problems for standard statistical inference. Instead we explore imputation methods for generating replacement values for missing data that will permit complete case analysis using the whole data set and we illustrate these methods using two data sets that had incomplete resource use information

    Missing Data in the Context of Student Growth

    Get PDF
    One property of student growth data that is often overlooked despite widespread prevalence is incomplete or missing observations. As students migrate in and out of school districts, opt out of standardized testing, or are absent on test days, there are many reasons student records are fractured. Missing data in growth models can bias model estimates and growth inferences. This study presents empirical explorations of how well missing data methodologies recover attributes of would-be complete student data used for teacher evaluation. Missing data methods are compared in the context of a Student Growth Percentiles (SGP) model used by several school systems for accountability purposes. Using a real longitudinal dataset, we evaluate the sensitivity of growth estimates to missing data and compare the following missing data methods: listwise deletion, likelihood-based imputation using an expectation-maximization algorithm, multiple imputation using a Markov Chain Monte Carlo method, multiple imputation using a predictive mean matching method, and inverse probability weighting. Methodological and practical consequences of missing data are discussed

    Modeling longitudinal data with interval censored anchoring events

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)In many longitudinal studies, the time scales upon which we assess the primary outcomes are anchored by pre-specified events. However, these anchoring events are often not observable and they are randomly distributed with unknown distribution. Without direct observations of the anchoring events, the time scale used for analysis are not available, and analysts will not be able to use the traditional longitudinal models to describe the temporal changes as desired. Existing methods often make either ad hoc or strong assumptions on the anchoring events, which are unveri able and prone to biased estimation and invalid inference. Although not able to directly observe, researchers can often ascertain an interval that includes the unobserved anchoring events, i.e., the anchoring events are interval censored. In this research, we proposed a two-stage method to fit commonly used longitudinal models with interval censored anchoring events. In the first stage, we obtain an estimate of the anchoring events distribution by nonparametric method using the interval censored data; in the second stage, we obtain the parameter estimates as stochastic functionals of the estimated distribution. The construction of the stochastic functional depends on model settings. In this research, we considered two types of models. The first model was a distribution-free model, in which no parametric assumption was made on the distribution of the error term. The second model was likelihood based, which extended the classic mixed-effects models to the situation that the origin of the time scale for analysis was interval censored. For the purpose of large-sample statistical inference in both models, we studied the asymptotic properties of the proposed functional estimator using empirical process theory. Theoretically, our method provided a general approach to study semiparametric maximum pseudo-likelihood estimators in similar data situations. Finite sample performance of the proposed method were examined through simulation study. Algorithmically eff- cient algorithms for computing the parameter estimates were provided. We applied the proposed method to a real data analysis and obtained new findings that were incapable using traditional mixed-effects models.2 year

    Network dynamical stability analysis of homeostasis reveals "mallostasis": biological equilibria drifting towards worsening health with age

    Full text link
    Using longitudinal study data, we dynamically model how aging affects homeostasis in both mice and humans. We operationalize homeostasis as a multivariate mean-reverting stochastic process. Our central hypothesis is that homeostasis causes biomarkers to have stable equilibrium values, but that deviations from equilibrium of one biomarker can affect other biomarkers through an interaction network. These interactions preclude analysis of one biomarker at a time. We therefore looked for age-related changes to homeostasis using dynamic network stability analysis (eigen-analysis), which transforms observed biomarker data into independent "natural" variables and determines their associated recovery rates. Most natural variables remained near equilibrium and were essentially constant in time. Some natural variables were unable to equilibrate due to a gradual drift with age in their homeostatic equilibrium, i.e. allostasis. This drift caused them to accumulate over the lifespan course. These accumulating variables are natural aging variables. Their rate of accumulation was correlated with risk of adverse outcomes: death or dementia onset. We call this tendency for aging organisms to drift towards an equilibrium position of ever-worsening health "mallostasis". We demonstrate that the effects of mallostasis on observed biomarkers are spread out through the interaction network. This could provide a redundancy mechanism to preserve functioning until multi-system dysfunction emerges at advanced ages.Comment: 11 pages and 5 figures + supplemental (30 pages, 2 tables and 17 figures
    corecore