Severe Acute Respiratory Syndrome (SARS) continues to pose a substantial public health challenge in Brazil, with prolonged hospitalizations increasing pressure on healthcare resources. This study utilized Brazil’s national SIVEP-Gripe surveillance system, a comprehensive repository of anonymized, individual-level records for SARS cases including influenza and other respiratory viruses, to develop and evaluate machine learning models. Using data from 2024, we constructed a preprocessed dataset consisting of 64,238 hospitalized patient records. This dataset was built using 32 independent variables, all of which are available at the time of patient admission. The focus of this dataset is to predict prolonged hospital length of stay (PLOS > 7 days). Three ensemble tree-based algorithms—Random Forest, XGBoost, and CatBoost—were trained after data preprocessing and robust imputation, using stratified 5-fold cross-validation with AUC maximization. The models exhibited moderate but consistent predictive performance, with AUC values around 0.65. XGBoost achieved the best balance between sensitivity and specificity, while Random Forest achieved higher recall for prolonged-stay cases. Explainable AI analysis using SHAP values revealed asthma, age, oxygen saturation, and geographic region as the most influential predictors. These findings underscore the potential of explainable machine learning approaches to support early hospital resource planning using routinely collected surveillance data. Future research should incorporate dynamic and clinical progression variables to further enhance predictive performance and real-world applicability. Severe Acute Respiratory Syndrome (SARS) continues to pose a substantial public health challenge in Brazil, with prolonged hospitalizations increasing pressure on healthcare resources. This study utilized Brazil’s national SIVEP-Gripe surveillance system, a comprehensive repository of anonymized, individual-level records for SARS cases including influenza and other respiratory viruses, to develop and evaluate machine learning models. Using data from 2024, we constructed a preprocessed dataset consisting of 64,238 hospitalized patient records. This dataset was built using 32 independent variables, all of which are available at the time of patient admission. The focus of this dataset is to predict prolonged hospital length of stay (PLOS > 7 days). Three ensemble tree-based algorithms—Random Forest, XGBoost, and CatBoost—were trained after data preprocessing and robust imputation, using stratified 5-fold cross-validation with AUC maximization. The models exhibited moderate but consistent predictive performance, with AUC values around 0.65. XGBoost achieved the best balance between sensitivity and specificity, while Random Forest achieved higher recall for prolonged-stay cases. Explainable AI analysis using SHAP values revealed asthma, age, oxygen saturation, and geographic region as the most influential predictors. These findings underscore the potential of explainable machine learning approaches to support early hospital resource planning using routinely collected surveillance data. Future research should incorporate dynamic and clinical progression variables to further enhance predictive performance and real-world applicability. 
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.