Predicting length of stay (LOS) in a hospital post-sugery

Abstract

Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe amount of time a patient stays in the hospital after a surgery has been an issue that hospital management faces, a longer stay in the recovery room involves a high cost to the hospital and consumes a lot of hospital resources, manpower and equipment. The amount of time is difficult to predict precisely since there are many external and internal factors that account for a longer or shorter stay and it is difficult for a team to consider all these factors and make this estimation manually. With the advancement of machine learning methods and models this prediction can be made automatically. The aim of this study was to create a predicting model that look at the patient data and the procedure data and predicts the amount of time the patient will stay after the surgery to make the current prediction of the length of stay by the hospital more accurate and compliment the current surgery scheduling and discharge system. To achieve the objective, a data mining approach was implemented. Python Language was used, with particular emphasis on Scikit-Learn, pandas and Seaborn packages. Tables from a relational database were processed and extracted to build a dataset. Exploratory data analysis was performed, and several model configurations were tested. The main differences that separate the models are outlier treatment, sampling techniques, feature scalers, feature engineering and type of algorithm – Linear Regression, Decision Trees Regressor, Multilayer Perceptron Regressor, Random Forest Regressor, Light Gradient Boosting Machine Regressor and Gradient Boosting Regressor. A total of 32993 hospital episodes were observed on this study. Out of these, 2006 were eliminated due to some data anomalies, namely, values that were wrong or impossible. The data was split in training and test data. Several model configurations were tested. The main differences that separate the models are outlier treatment, feature scalers, feature engineering and the type of algorithm. The best performing model had a score of 0.73 R2 which was obtained by using the Light Gradient Boosting Machine Regressor Algorithm using outlier removal, Robust Scaling and using all the features in the dataset

    Similar works