A Data-Driven and Explainable Machine Learning Approach to Predict Prolonged Hospitalization in Brazilian SARS Patients

de Oliveira, Fernando Henrique Moura; Rodrigues, Cleyton Mario de Oliveira

Search results>Research output from University of Pernambuco - Engineering School/ Editorial System Journals

research article

oai:ojs.poli.br:article/3537

A Data-Driven and Explainable Machine Learning Approach to Predict Prolonged Hospitalization in Brazilian SARS Patients

Authors: Fernando Henrique Moura de Oliveira
Cleyton Mario de Oliveira Rodrigues
Publication date: 18 February 2026
Publisher: Escola Politécnica de Pernambuco
Doi

Abstract

Severe Acute Respiratory Syndrome (SARS) continues to pose a substantial public health challenge in Brazil, with prolonged hospitalizations increasing pressure on healthcare resources. This study utilized Brazil’s national SIVEP-Gripe surveillance system, a comprehensive repository of anonymized, individual-level records for SARS cases including influenza and other respiratory viruses, to develop and evaluate machine learning models. Using data from 2024, we constructed a preprocessed dataset consisting of 64,238 hospitalized patient records. This dataset was built using 32 independent variables, all of which are available at the time of patient admission. The focus of this dataset is to predict prolonged hospital length of stay (PLOS > 7 days). Three ensemble tree-based algorithms—Random Forest, XGBoost, and CatBoost—were trained after data preprocessing and robust imputation, using stratified 5-fold cross-validation with AUC maximization. The models exhibited moderate but consistent predictive performance, with AUC values around 0.65. XGBoost achieved the best balance between sensitivity and specificity, while Random Forest achieved higher recall for prolonged-stay cases. Explainable AI analysis using SHAP values revealed asthma, age, oxygen saturation, and geographic region as the most influential predictors. These findings underscore the potential of explainable machine learning approaches to support early hospital resource planning using routinely collected surveillance data. Future research should incorporate dynamic and clinical progression variables to further enhance predictive performance and real-world applicability. Severe Acute Respiratory Syndrome (SARS) continues to pose a substantial public health challenge in Brazil, with prolonged hospitalizations increasing pressure on healthcare resources. This study utilized Brazil’s national SIVEP-Gripe surveillance system, a comprehensive repository of anonymized, individual-level records for SARS cases including influenza and other respiratory viruses, to develop and evaluate machine learning models. Using data from 2024, we constructed a preprocessed dataset consisting of 64,238 hospitalized patient records. This dataset was built using 32 independent variables, all of which are available at the time of patient admission. The focus of this dataset is to predict prolonged hospital length of stay (PLOS > 7 days). Three ensemble tree-based algorithms—Random Forest, XGBoost, and CatBoost—were trained after data preprocessing and robust imputation, using stratified 5-fold cross-validation with AUC maximization. The models exhibited moderate but consistent predictive performance, with AUC values around 0.65. XGBoost achieved the best balance between sensitivity and specificity, while Random Forest achieved higher recall for prolonged-stay cases. Explainable AI analysis using SHAP values revealed asthma, age, oxygen saturation, and geographic region as the most influential predictors. These findings underscore the potential of explainable machine learning approaches to support early hospital resource planning using routinely collected surveillance data. Future research should incorporate dynamic and clinical progression variables to further enhance predictive performance and real-world applicability.&nbsp

Similar works

Full text

University of Pernambuco - Engineering School/ Editorial System Journals

oai:ojs.poli.br:article/3537

Last time updated on 13/03/2026

This paper was published in University of Pernambuco - Engineering School/ Editorial System Journals.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: http://creativecommons.org/licenses/by-nc/4.0