Search CORE

1,053 research outputs found

Imputation Estimators Partially Correct for Model Misspecification

Author: Minin Vladimir N.
O'Brien John D.
Seregin Arseni
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 23/04/2010
Field of study

Inference problems with incomplete observations often aim at estimating population properties of unobserved quantities. One simple way to accomplish this estimation is to impute the unobserved quantities of interest at the individual level and then take an empirical average of the imputed values. We show that this simple imputation estimator can provide partial protection against model misspecification. We illustrate imputation estimators' robustness to model specification on three examples: mixture model-based clustering, estimation of genotype frequencies in population genetics, and estimation of Markovian evolutionary distances. In the final example, using a representative model misspecification, we demonstrate that in non-degenerate cases, the imputation estimator dominates the plug-in estimate asymptotically. We conclude by outlining a Bayesian implementation of the imputation-based estimation.Comment: major rewrite, beta-binomial example removed, model based clustering is added to the mixture model example, Bayesian approach is now illustrated with the genetics exampl

arXiv.org e-Print Archive

CiteSeerX

Crossref

Statistical approaches for handling longitudinal and cross sectional discrete data with missing values focusing on multiple imputation and probability weighting.

Author: Stephen Aluko Omololu.
Publication venue
Publication date: 01/01/2018
Field of study

Doctor of Philosophy in Science. University of KwaZulu-Natal, Pietermaritzburg, 2018.Abstract available in PDF file

ResearchSpace@UKZN

Item selection by Latent Class-based methods

Author: Bartolucci Francesco
Montanari Giorgio E.
Pandolfi Silvia
Publication venue
Publication date: 15/07/2014
Field of study

The evaluation of nursing homes is usually based on the administration of questionnaires made of a large number of polytomous items. In such a context, the Latent Class (LC) model represents a useful tool for clustering subjects in homogenous groups corresponding to different degrees of impairment of the health conditions. It is known that the performance of model-based clustering and the accuracy of the choice of the number of latent classes may be affected by the presence of irrelevant or noise variables. In this paper, we show the application of an item selection algorithm to real data collected within a project, named ULISSE, on the quality-of-life of elderly patients hosted in italian nursing homes. This algorithm, which is closely related to that proposed by Dean and Raftery in 2010, is aimed at finding the subset of items which provides the best clustering according to the Bayesian Information Criterion. At the same time, it allows us to select the optimal number of latent classes. Given the complexity of the ULISSE study, we perform a validation of the results by means of a sensitivity analysis to different specifications of the initial subset of items and of a resampling procedure

arXiv.org e-Print Archive

CiteSeerX

Principled missing data methods for researchers

Author: Chao-Ying Joanne Peng
Yiran Dong
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

The impact of missing data on quantitative research can be serious, leading to biased estimates of parameters, loss of information, decreased statistical power, increased standard errors, and weakened generalizability of findings. In this paper, we discussed and demonstrated three principled missing data methods: multiple imputation, full information maximum likelihood, and expectation-maximization algorithm, applied to a real-world data set. Results were contrasted with those obtained from the complete data set and from the listwise deletion method. The relative merits of each method are noted, along with common features they share. The paper concludes with an emphasis on the importance of statistical assumptions, and recommendations for researchers. Quality of research will be enhanced if (a) researchers explicitly acknowledge missing data problems and the conditions under which they occurred, (b) principled methods are employed to handle missing data, and (c) the appropriate treatment of missing data is incorporated into review standards of manuscripts submitted for publication

Springer - Publisher Connector

PubMed Central

The effectiveness of missing data techniques in principal component analysis

Author: Maartens Huibrecht Elizabeth
Publication venue
Publication date: 01/01/2015
Field of study

A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2015.Exploratory data analysis (EDA) methods such as Principal Component Analysis (PCA) play an important role in statistical analysis. The analysis assumes that a complete dataset is observed. If the underlying data contains missing observations, the analysis cannot be completed immediately as a method to handle these missing observations must first be implemented. Missing data are a problem in any area of research, but researchers tend to ignore the problem, even though the missing observations can lead to incorrect conclusions and results. Many methods exist in the statistical literature for handling missing data. There are many methods in the context of PCA with missing data, but few studies have focused on a comparison of these methods in order to determine the most effective method. In this study the effectiveness of the Expectation Maximisation (EM) algorithm and the iterative PCA (iPCA) algorithm are assessed and compared against the well-known yet flawed methods of case-wise deletion (CW) and mean imputation. Two techniques for the application of the multiple imputation (MI) method of Markov Chain Monte Carlo (MCMC) with the EM algorithm in a PCA context are suggested and their effectiveness is evaluated compared to the other methods. The analysis is based on a simulated dataset and the effectiveness of the methods analysed using the sum of squared deviations (SSD) and the Rv coefficient, a measure of similarity between two datasets. The results show that the MI technique applying PCA in the calculation of the final imputed values and the iPCA algorithm are the most effective techniques, compared to the other techniques in the analysis

Wits Institutional Repository on DSPACE

Hedonic Price Indices for the Paris Housing Market

Author: Maurer Raimund
Pitzer Martin
Sebastian Steffen P.
Publication venue
Publication date: 01/01/2004
Field of study

In this paper, we calculate a transaction-based price index for apartments in Paris (France). The heterogeneous character of real estate is taken into account using an hedonic model. The functional form is specified using a general Box-Cox function. The data basis covers 84 686 transactions of the housing market in 1990:01-1999:12, which is one of the largest samples ever used in comparable studies. Low correlations of the price index with stock and bond indices (first differences) indicate diversification benefits from the inclusion of real estate in a mixed asset portfolio

University of Regensburg Publication Server

Hochschulschriftenserver - Universität Frankfurt am Main

New Technique for Imputing Missing Item Responses for an Ordinal Variable: Using Tennessee Youth Risk Behavior Survey as an Example.

Author: Ahmed Andaleeb Abrar
Publication venue: Digital Commons @ East Tennessee State University
Publication date: 15/12/2007
Field of study

Surveys ordinarily ask questions in an ordinal scale and often result in missing data. We suggest a regression based technique for imputing missing ordinal data. Multilevel cumulative logit model was used with an assumption that observed responses of certain key variables can serve as covariate in predicting missing item responses of an ordinal variable. Individual predicted probabilities at each response level were obtained. Average individual predicted probabilities for each response level were used to randomly impute the missing responses using a uniform distribution. Finally, likelihood ratio chi square statistics was used to compare the imputed and observed distributions. Two other forms of multiple imputation algorithms were performed for comparison. Performance of our imputation technique was comparable to other 2 established algorithms. Our method being simpler does not involve any complex algorithms and with further research can potentially be used as an imputation technique for missing ordinal variables

East Tennessee State University