In recent years, there has been a growing interest in utilizing machine learning to predict the likelihood of specific outcomes, particularly in emergency medicine. This thesis focuses on applying machine learning techniques to predict emergency department (ED) bouncebacks—unscheduled patient revisits to the ED within 72 hours—which significantly strain resources, contribute to overcrowding, and often highlight gaps in clinical care in the ED. This study specifically targets high-risk ED revisits that result in hospital admission. Methods: Patient data from a major academic hospital in Lebanon was analyzed using various machine learning algorithms, including logistic regression, neural networks, LightGBM, and XGBoost. Innovative approaches, such as frequency encoding for high-cardinality categorical variables like diagnoses and chief complaints, were employed to optimize model performance while minimizing computational complexity. A temporal training strategy was implemented, training models on data from 2018 to 2022 and part of 2023, with predictions made on the remaining portion of 2023. For the refined model, the dataset was reduced to the top 48 features of importance, supplemented with medication data, to enhance prediction accuracy further. Results: In the initial analysis, XGBoost demonstrated the highest performance among all models, achieving a sensitivity of 0.97, precision of 0.83, and an AUC-ROC of 0.99. Key predictors identified by the model included arrival-to-disposition duration, Age, Diagnosis, BP_Systolic, and Chief Complaint. In the refined model, the XGBoost model trained on a reduced feature set achieved a sensitivity of 0.86, specificity of 0.98, precision of 0.71, and an AUC-ROC of 0.99 on the test set. The refined model confirmed the importance of temporal variables, such as arrival-to-disposition duration and triage duration, alongside patient-specific and clinical features. Although the refined model exhibited a slight decrease in sensitivity, it remained the preferred choice due to its reliance on the most influential features identified in the initial analysis. Furthermore, it was trained using a temporal validation approach that mirrors real-world clinical scenarios, where models are trained on past data and tested on unseen future cases. This strategy enhances the model’s generalizability and practical applicability in dynamic healthcare environments. At the same time, incorporating medication data further improved predictive performance. Conclusion: This research highlights the potential of machine learning to support clinical decision-making by identifying high-risk ED revisits. The findings demonstrate that integrating medication data, leveraging frequency encoding for high-cardinality features, and applying temporal validation strategies enhance model performance and generalizability. By optimizing feature representation and incorporating both historical and recent patient data, the initial and refined models provide a scalable framework for predictive modeling. These approaches contribute to more accurate identification of high-risk ED bounce backs and support the future integration of machine learning in clinical practice
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.