279 research outputs found

    Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer

    Get PDF
    Propensity scoring (PS) is an established tool to account for measured confounding in non-randomized studies. These methods are sensitive to missing values, which are a common problem in observational data. The combination of multiple imputation of missing values and different propensity scoring techniques is addressed in this work. For a sample of lymph node-positive vulvar cancer patients, we re-analyze associations between the application of radiotherapy and disease-related and non-related survival. Inverse-probability-of-treatment-weighting (IPTW) and PS stratification are applied after multiple imputation by chained equation (MICE). Methodological issues are described in detail. Interpretation of the results and methodological limitations are discussed

    Improving Cardiovascular Disease Prediction by Integrating Imputation, Imbalance Resampling, and Feature Selection Techniques into Machine Learning Model

    Get PDF
    Cardiovascular disease (CVD) is the leading cause of death worldwide. Primary prevention is by early prediction of the disease onset. Using laboratory data from the National Health and Nutrition Examination Survey (NHANES) in 2017-2020 timeframe (N= 7.974), we tested the ability of machine learning (ML) algorithms to classify individuals at risk. The ML models were evaluated based on their classification performances after comparing four imputation, three imbalance resampling, and three feature selection techniques.Due to its popularity, we utilized decision tree (DT) as the baseline. Integration of multiple imputation by chained equation (MICE) and synthetic minority oversampling with Tomek link down-sampling (SMOTETomek) into the model improved the area under the curve-receiver operating characteristics (AUC-ROC) from 57% to 83%. Applying simultaneous perturbation feature selection and ranking (spFSR) reduced the feature predictors from 144 to 30 features and the computational time by 22%. The best techniques were applied to six ML models, resulting in Xtreme gradient boosting (XGBoost) achieving the highest accuracy of 93% and AUC-ROC of 89%.The accuracy of our ML model in predicting CVD outperforms those from previous studies. We also highlight the important causes of CVD, which might be investigated further for potential effects on electronic health records.

    Un confronto empirico su possibili combinazioni tra techniche di imputazione e trattamento del confondimento tramite propensity score in presenza di dati mancanti

    Get PDF
    Negli studi osservazionali la stima dell'effetto causale può essere soggetta a distorsione dovuta alla presenza di confondenti. I metodi basati sul propensity score sono comunemente utilizzati per correggere le stime da tale distorsione. Un altro problema che spesso caratterizza l'analisi statistica è quello dei dati mancanti. L'obiettivo del presente lavoro è quello di confrontare le prestazioni dei quattro principali metodi per trattare il confondimento basati sul propensity score in combinazione con diverse tecniche di imputazione dei dati mancanti. Attraverso una serie di studi di simulazione si sono confrontati i metodi di matching, stratificazione, covariate adjustment e inverse probability of treatment weighting basati sul propensity score per stimare l'effetto medio del trattamento sui trattati (ATT) dopo aver eseguito l'imputazione dei dati mancanti tramite unconditional mean imputation e alcune versioni di multiple imputation by chained equation

    Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE)

    Full text link
    BackgroundUltrasound measures are valuable for epidemiologic studies of risk factors for growth restriction. Longitudinal measurements enable investigation of rates of change and identification of windows where growth is impacted more acutely. However, missing data can be problematic in these studies, limiting sample size, ability to characterise windows of vulnerability, and in some instances creating bias. We sought to compare a parametric linear mixed model (LMM) approach to multiple imputation in this setting with multiple imputation by chained equation (MICE) methodology.MethodsUltrasound scans performed for clinical purposes were abstracted from women in the LIFECODES birth cohort (n = 1003) if they were close in time to three study visits (median 18, 26, and 35 weeks’ gestation). We created imputed datasets using LMM and MICE and calculated associations between demographic factors and ultrasound parameters cross‐sectionally and longitudinally. Results were compared with a complete‐case analysis.ResultsMost participants had ultrasounds at 18 weeks’ gestation, and ~50% had measurements at 26 and 35 weeks; 100% had birthweight. Associations between demographic factors and ultrasound measures were similar in magnitude, but more precise, when either imputed datasets were used, compared with a complete‐case analysis, in both the cross‐sectional or longitudinal analyses.ConclusionsMICE, though ignoring the non‐linear features of the trajectory and within subject correlation, is able to provide reasonable imputation of foetal growth data when compared to LMM. Because it simultaneously imputes missing covariate data and does not require specification of variance structure as in LMM, MICE may be preferable for imputation in this setting.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/146300/1/ppe12486_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/146300/2/ppe12486.pd

    Imputation Techniques in Machine Learning – A Survey

    Get PDF
    Machine learning plays a pivotal role in data analysis and information extraction. However, one common challenge encountered in this process is dealing with missing values. Missing data can find its way into datasets for a variety of reasons. It can result from errors during data collection and management, intentional omissions, or even human errors. It's important to note that most machine learning models are not designed to handle missing values directly. Consequently, it becomes essential to perform data imputation before feeding the data into a machine learning model. Multiple techniques are available for imputing missing values, and the choice of technique should be made judiciously, considering various parameters. An inappropriate choice can disrupt the overall distribution of data values and subsequently impact the model's performance. In this paper, various imputation methods, including Mean, Median, K-nearest neighbors (KNN)-based imputation, Linear Regression, Miss Forest, and MICE are examined

    Paediatr Perinat Epidemiol

    Get PDF
    Background:Ultrasound measures are valuable for epidemiologic studies of risk factors for growth restriction. Longitudinal measurements enable investigation of rates of change and identification of windows where growth is impacted more acutely. However, missing data can be problematic in these studies, limiting sample size, ability to characterise windows of vulnerability, and in some instances creating bias. We sought to compare a parametric linear mixed model (LMM) approach to multiple imputation in this setting with multiple imputation by chained equation (MICE) methodology.Methods:Ultrasound scans performed for clinical purposes were abstracted from women in the LIFECODES birth cohort (n = 1003) if they were close in time to three study visits (median 18, 26, and 35 weeks\u2019 gestation). We created imputed datasets using LMM and MICE and calculated associations between demographic factors and ultrasound parameters cross-sectionally and longitudinally. Results were compared with a complete-case analysis.Results:Most participants had ultrasounds at 18 weeks\u2019 gestation, and ~50% had measurements at 26 and 35 weeks; 100% had birthweight. Associations between demographic factors and ultrasound measures were similar in magnitude, but more precise, when either imputed datasets were used, compared with a complete-case analysis, in both the cross-sectional or longitudinal analyses.Conclusions:MICE, though ignoring the non-linear features of the trajectory and within subject correlation, is able to provide reasonable imputation of foetal growth data when compared to LMM. Because it simultaneously imputes missing covariate data and does not require specification of variance structure as in LMM, MICE may be preferable for imputation in this setting.P42 ES017198/ES/NIEHS NIH HHS/United StatesT42 OH008455/OH/NIOSH CDC HHS/United StatesR01 ES018872/ES/NIEHS NIH HHS/United StatesZIA103321/ES/NIEHS NIH HHS/United StatesP30 ES017885/ES/NIEHS NIH HHS/United States2020-01-02T00:00:00Z30016545PMC69392977040vault:3435

    Generating the TAKEMOD input dataset – Cleaning, Merging and Imputations

    Full text link
    TAKE - Reducing poverty through improving the take-up of social policie

    Health-related quality of life of children and their parents 2 years after critical illness

    Get PDF
    Background: Pediatric intensive care unit (PICU) survivors are at risk for prolonged morbidities interfering with daily life. The current study examined parent-reported health-related quality of life (HRQoL) in former critically ill children and parents themselves and aimed to determine whether withholding parenteral nutrition (PN) in the first week of critical illness affected children’s and parents’ HRQoL 2 years later. Methods: Children who participated in the pediatric early versus late parenteral nutrition in critical illness (PEPaNIC) trial and who were testable 2 years later (n = 1158) were included. Their HRQoL outcomes were compared with 405 matched healthy controls. At PICU admission, childre
    • …
    corecore