3 research outputs found
Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare
Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends
Recommended from our members
Visualization, Prediction, and Causal Inference: Applications in Healthcare
The recent wave of data collection in the field of healthcare has opened up an ocean of possibilities to learn and develop new exploratory, diagnostic, and prognostic methods. This thesis explores how three fields of statistics (1) data visualization, (2) prediction, (3) and causal inference, can help us leverage this data in order to answer a wide range of questions in healthcare.Part I of this thesis presents a software package called superheat that can be used by researchers to visualize complex datasets and multi-faceted modeling results. The primary users of this software so far have been those in the medical research industry. In this thesis, we apply superheat in three case studies including (1) using a publicly available global organ donation database curated by the World Health Organization to understand and summarize the global organ donation trends, (2) visualizing groups of topics that appear in text data scraped from Google News, (3) examining model performance for a model designed to predict the brain's response to images using fMRI data. The theme of Part 1 of this thesis is visualization in healthcare.Part II of this thesis introduces an analysis for predicting a patient's risk of developing a Surgical Site Infection (SSI) following surgery. A SSI is an infection that occurs at the site of a surgery within 30 days post surgery, and is responsible for up to 30% of hospital acquired infections. This method was developed in collaboration with healthcare professionals including infectious disease experts and surgeons at UC Davis. The theme of Part 2 of this thesis is prediction in healthcare.Part III of this thesis presents a novel application of instrumental variables in causal inference, asking about the possible effectiveness of a "survival-benefit"-based liver transplant allocation scheme. The conclusion is that while there could be substantial benefit yielded from rethinking how organs are allocated, the feasibility of implementing such a scheme that relies drawing causal inferences from complex observational data is extremely difficult. The theme of Part 3 of this thesis is causal inference in healthcare