279 research outputs found
Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer
Propensity scoring (PS) is an established tool to account for measured confounding in non-randomized studies. These methods are sensitive to missing values, which are a common problem in observational data. The combination of multiple imputation of missing values and different propensity scoring techniques is addressed in this work. For a sample of lymph node-positive vulvar cancer patients, we re-analyze associations between the application of radiotherapy and disease-related and non-related survival. Inverse-probability-of-treatment-weighting (IPTW) and PS stratification are applied after multiple imputation by chained equation (MICE). Methodological issues are described in detail. Interpretation of the results and methodological limitations are discussed
Improving Cardiovascular Disease Prediction by Integrating Imputation, Imbalance Resampling, and Feature Selection Techniques into Machine Learning Model
Cardiovascular disease (CVD) is the leading cause of death worldwide. Primary prevention is by early prediction of the disease onset. Using laboratory data from the National Health and Nutrition Examination Survey (NHANES) in 2017-2020 timeframe (N= 7.974), we tested the ability of machine learning (ML) algorithms to classify individuals at risk. The ML models were evaluated based on their classification performances after comparing four imputation, three imbalance resampling, and three feature selection techniques.Due to its popularity, we utilized decision tree (DT) as the baseline. Integration of multiple imputation by chained equation (MICE) and synthetic minority oversampling with Tomek link down-sampling (SMOTETomek) into the model improved the area under the curve-receiver operating characteristics (AUC-ROC) from 57% to 83%. Applying simultaneous perturbation feature selection and ranking (spFSR) reduced the feature predictors from 144 to 30 features and the computational time by 22%. The best techniques were applied to six ML models, resulting in Xtreme gradient boosting (XGBoost) achieving the highest accuracy of 93% and AUC-ROC of 89%.The accuracy of our ML model in predicting CVD outperforms those from previous studies. We also highlight the important causes of CVD, which might be investigated further for potential effects on electronic health records.
Un confronto empirico su possibili combinazioni tra techniche di imputazione e trattamento del confondimento tramite propensity score in presenza di dati mancanti
Negli studi osservazionali la stima dell'effetto causale può essere soggetta a distorsione dovuta alla presenza di confondenti. I metodi basati sul propensity score sono comunemente utilizzati per correggere le stime da tale distorsione. Un altro problema che spesso caratterizza l'analisi statistica è quello dei dati mancanti. L'obiettivo del presente lavoro è quello di confrontare le prestazioni dei quattro principali metodi per trattare il confondimento basati sul propensity score in combinazione con diverse tecniche di imputazione dei dati mancanti. Attraverso una serie di studi di simulazione si sono confrontati i metodi di matching, stratificazione, covariate adjustment e inverse probability of treatment weighting basati sul propensity score per stimare l'effetto medio del trattamento sui trattati (ATT) dopo aver eseguito l'imputazione dei dati mancanti tramite unconditional mean imputation e alcune versioni di multiple imputation by chained equation
Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE)
BackgroundUltrasound measures are valuable for epidemiologic studies of risk factors for growth restriction. Longitudinal measurements enable investigation of rates of change and identification of windows where growth is impacted more acutely. However, missing data can be problematic in these studies, limiting sample size, ability to characterise windows of vulnerability, and in some instances creating bias. We sought to compare a parametric linear mixed model (LMM) approach to multiple imputation in this setting with multiple imputation by chained equation (MICE) methodology.MethodsUltrasound scans performed for clinical purposes were abstracted from women in the LIFECODES birth cohort (n = 1003) if they were close in time to three study visits (median 18, 26, and 35 weeksâ gestation). We created imputed datasets using LMM and MICE and calculated associations between demographic factors and ultrasound parameters crossâsectionally and longitudinally. Results were compared with a completeâcase analysis.ResultsMost participants had ultrasounds at 18 weeksâ gestation, and ~50% had measurements at 26 and 35 weeks; 100% had birthweight. Associations between demographic factors and ultrasound measures were similar in magnitude, but more precise, when either imputed datasets were used, compared with a completeâcase analysis, in both the crossâsectional or longitudinal analyses.ConclusionsMICE, though ignoring the nonâlinear features of the trajectory and within subject correlation, is able to provide reasonable imputation of foetal growth data when compared to LMM. Because it simultaneously imputes missing covariate data and does not require specification of variance structure as in LMM, MICE may be preferable for imputation in this setting.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/146300/1/ppe12486_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/146300/2/ppe12486.pd
Imputation Techniques in Machine Learning â A Survey
Machine learning plays a pivotal role in data analysis and information extraction. However, one common challenge encountered in this process is dealing with missing values. Missing data can find its way into datasets for a variety of reasons. It can result from errors during data collection and management, intentional omissions, or even human errors. It's important to note that most machine learning models are not designed to handle missing values directly. Consequently, it becomes essential to perform data imputation before feeding the data into a machine learning model. Multiple techniques are available for imputing missing values, and the choice of technique should be made judiciously, considering various parameters. An inappropriate choice can disrupt the overall distribution of data values and subsequently impact the model's performance. In this paper, various imputation methods, including Mean, Median, K-nearest neighbors (KNN)-based imputation, Linear Regression, Miss Forest, and MICE are examined
Paediatr Perinat Epidemiol
Background:Ultrasound measures are valuable for epidemiologic studies of risk factors for growth restriction. Longitudinal measurements enable investigation of rates of change and identification of windows where growth is impacted more acutely. However, missing data can be problematic in these studies, limiting sample size, ability to characterise windows of vulnerability, and in some instances creating bias. We sought to compare a parametric linear mixed model (LMM) approach to multiple imputation in this setting with multiple imputation by chained equation (MICE) methodology.Methods:Ultrasound scans performed for clinical purposes were abstracted from women in the LIFECODES birth cohort (n = 1003) if they were close in time to three study visits (median 18, 26, and 35 weeks\u2019 gestation). We created imputed datasets using LMM and MICE and calculated associations between demographic factors and ultrasound parameters cross-sectionally and longitudinally. Results were compared with a complete-case analysis.Results:Most participants had ultrasounds at 18 weeks\u2019 gestation, and ~50% had measurements at 26 and 35 weeks; 100% had birthweight. Associations between demographic factors and ultrasound measures were similar in magnitude, but more precise, when either imputed datasets were used, compared with a complete-case analysis, in both the cross-sectional or longitudinal analyses.Conclusions:MICE, though ignoring the non-linear features of the trajectory and within subject correlation, is able to provide reasonable imputation of foetal growth data when compared to LMM. Because it simultaneously imputes missing covariate data and does not require specification of variance structure as in LMM, MICE may be preferable for imputation in this setting.P42 ES017198/ES/NIEHS NIH HHS/United StatesT42 OH008455/OH/NIOSH CDC HHS/United StatesR01 ES018872/ES/NIEHS NIH HHS/United StatesZIA103321/ES/NIEHS NIH HHS/United StatesP30 ES017885/ES/NIEHS NIH HHS/United States2020-01-02T00:00:00Z30016545PMC69392977040vault:3435
Generating the TAKEMOD input dataset â Cleaning, Merging and Imputations
TAKE - Reducing poverty through improving the take-up of social policie
Health-related quality of life of children and their parents 2 years after critical illness
Background: Pediatric intensive care unit (PICU) survivors are at risk for prolonged morbidities interfering with daily
life. The current study examined parent-reported health-related quality of life (HRQoL) in former critically ill children
and parents themselves and aimed to determine whether withholding parenteral nutrition (PN) in the first week of
critical illness affected childrenâs and parentsâ HRQoL 2 years later.
Methods: Children who participated in the pediatric early versus late parenteral nutrition in critical illness (PEPaNIC)
trial and who were testable 2 years later (n = 1158) were included. Their HRQoL outcomes were compared with 405
matched healthy controls. At PICU admission, childre
- âŚ