7 research outputs found
Cost-sensitive ordinal classification methods to predict SARS-CoV-2 pneumonia severity
Objective: To study the suitability of cost-sensitive ordinal artificial intelligence-machine learning (AI-ML) strategies in the prognosis of SARS-CoV-2 pneumonia severity.
Materials & methods: Observational, retrospective, longitudinal, cohort study in 4 hospitals in Spain. Information regarding demographic and clinical status was supplemented by socioeconomic data and air pollution exposures. We proposed AI-ML algorithms for ordinal classification via ordinal decomposition and for cost-sensitive learning via resampling techniques. For performance-based model selection, we defined a custom score including per-class sensitivities and asymmetric misprognosis costs. 260 distinct AI-ML models were evaluated via 10 repetitions of 5×5 nested cross-validation with hyperparameter tuning. Model selection was followed by the calibration of predicted probabilities. Final overall performance was compared against five well-established clinical severity scores and against a ‘standard’ (non-cost sensitive, non-ordinal) AI-ML baseline. In our best model, we also evaluated its explainability with respect to each of the input variables.
Results: The study enrolled =1548 patients: 712 experienced low, 238 medium, and 598 high clinical severity. =131 variables were collected, becoming =148 features after categorical encoding. Model selection resulted in our best-performing AI-ML pipeline having: a) no imputation of missing data, b) no feature selection (i.e. using the full set of features), c) ‘Ordered Partitions’ ordinal decomposition, d) cost-based reimbalance, and e) a Histogram-based Gradient Boosting classifier. This best model (calibrated) obtained a median accuracy of 68.1% [67.3%, 68.8%] (95% confidence interval), a balanced accuracy of 57.0% [55.6%, 57.9%], and an overall area under the curve (AUC) 0.802 [0.795, 0.808]. In our dataset, it outperformed all five clinical severity scores and the ‘standard’ AI-ML baseline.
Discussion & conclusion: We conducted an exhaustive exploration of AI-ML methods designed for both ordinal and cost-sensitive classification, motivated by a real-world application domain (clinical severity prognosis) in which these topics arise naturally. Our model with the best classification performance exploited successfully the ordering information of ground truth classes, coping with imbalance and asymmetric costs. However, these ordinal and cost-sensitive aspects are seldom explored in the literature
Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques
With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML).
In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis.
We conducted a multi-centre clinical study, enrolling n=1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing ⩾60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these became d=148 features after variable
encoding.
We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (⩾0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds.
The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient’s C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels –saturation SpO2, quotients SpO2/RR and arterial SatO2/FiO2 –, the neutrophil-to-lymphocyte ratio (NLR) –to certain extent, also neutrophil and lymphocyte counts separately–, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood.
A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives
Impact of outdoor air pollution on severity and mortality in COVID-19 pneumonia
The relationship between exposure to air pollution and the severity of coronavirus disease 2019 (COVID-19) pneumonia and other outcomes is poorly understood. Beyond age and comorbidity, risk factors for adverse outcomes including death have been poorly studied. The main objective of our study was to examine the relationship between exposure to outdoor air pollution and the risk of death in patients with COVID-19 pneumonia using individual-level data. The secondary objective was to investigate the impact of air pollutants on gas exchange and systemic inflammation in this disease. This cohort study included 1548 patients hospitalised for COVID-19 pneumonia between February and May 2020 in one of four hospitals. Local agencies supplied daily data on environmental air pollutants (, , , , and ) and meteorological conditions (temperature and humidity) in the year before hospital admission (from January 2019 to December 2019). Daily exposure to pollution and meteorological conditions by individual postcode of residence was estimated using geospatial Bayesian generalised additive models. The influence of air pollution on pneumonia severity was studied using generalised additive models which included: age, sex, Charlson comorbidity index, hospital, average income, air temperature and humidity, and exposure to each pollutant. Additionally, generalised additive models were generated for exploring the effect of air pollution on C-reactive protein (CRP) level and Sp/Fi at admission. According to our results, both risk of COVID-19 death and CRP level increased significantly with median exposure to , , and , while higher exposure to , and was associated with lower Sp/Fi ratios. In conclusion, after controlling for socioeconomic, demographic and health-related variables, we found evidence of a significant positive relationship between air pollution and mortality in patients hospitalised for COVID-19 pneumonia. Additionally, inflammation (CRP) and gas exchange (Sp/Fi) in these patients were significantly related to exposure to air pollution
Impacto cuantitativo de la contaminación en la probabilidad de muerte por neumonÃa por SARS-CoV-2
Introducción
La evidencia cientÃfica disponible señala que la contaminación del aire exterior podrÃa agravar la severidad de la COVID-19 y por ende, incrementar las probabilidades de fallecimiento.
Material y métodos
Estudio observacional longitudinal retrospectivo de cohortes, multicéntrico en 4 hospitales: 2 en Bizkaia (1 urbano, 1 urbano-rural), Valencia y Barcelona (urbanos). Se incluyeron ingresos por neumonÃa SARS-CoV-2 en el primer pico epidémico de COVID-19 (febrero-mayo 2020).
Para determinar la exposición a contaminación por PM y NO, se obtuvieron los datos publicados por los organismos autonómicos de calidad del aire, para 2019 y 1er semestre 2020. Se utilizó un Modelo Aditivo Generalizado (GAM) para estimar el nivel diario de contaminante en cada código postal, en función de las coordenadas geográficas y la altitud de las estaciones de medición [Figura 1]. Para determinar la exposición crónica, se calcularon media y máximo en 2019; la aguda se caracterizó por media y máximo en los 7 dÃas anteriores al ingreso.
Se estudió la razón de probabilidades (‘odds ratio’, OR) de muerte frente a supervivencia entre nuestra cohorte. Se modeló mediante un GAM con regresión logÃstica, incorporando como efectos fijos sexo, edad y contaminante; hospital como efecto aleatorio e Ãndice de comorbilidad de Charlson como función suave mediantes splines penalizados.
Resultados
De los 1548 pacientes reclutados, 243 (15.7%) fallecieron durante su hospitalización y/o 30 dÃas postingreso. Según los modelos [Tabla 1], existe evidencia estadÃstica significativa de que la exposición crónica a PM y NO incrementan la probabilidad de muerte por neumonÃa SARS-CoV-2. Compensando por sexo, edad y Charlson -todos factores relacionados positivamente con el OR de muerte- asà como por hospital; por cada incremento de 10 μg/m en el nivel de PM (máximo anual) el OR aumenta en 10.5%, linealmente proporcional al incremento en la contaminación. Mientras, cada 10 μg/m más de NO2 (media anual) aumentan OR en 35.7%; cada 10 μg/m más en exposición aguda a NO2 (media semana pre-ingreso): 62.9%; y NO (máximo semana): 34.4%.
Conclusiones
Se cuantificaron y compensaron los efectos de los factores sexo, edad, Charlson y hospital. A igualdad de estos, incrementos en la exposición crónica y aguda a PM y NO aumentan de manera lineal y estadÃsticamente significativa la probabilidad de muerte por neumonÃa SARS-CoV-2
Predicción de la gravedad de neumonÃas por SARS-CoV-2 a partir de información clÃnica y contaminación, mediante inteligencia artificial
Introducción
La contaminación del aire exterior se ha relacionado con mayor gravedad de las infecciones respiratorias. Por tanto, su inclusión en algoritmos predictivos podrÃan añadir información para pronosticar la gravedad de neumonÃas SARS-CoV-2.
Material y métodos
Estudio observacional longitudinal retrospectivo de cohortes, multicéntrico en 4 hospitales. Se incluyeron ingresos por neumonÃa SARS-CoV-2 en el primer pico epidémico de COVID-19 (febrero-mayo 2020).
Se recogieron hasta 93 variables clÃnicas, analÃticas y radiológicas por cada paciente (sexo, edad, peso, comorbilidades, sÃntomas, variables fisiológicas en urgencias, sangre, gasometrÃa, etc.). Además, se calcularon los niveles exposición a contaminación por PM, PM, O, NO, NO, NO, SO y CO en su código postal. En función de la evolución clÃnica de la neumonÃa, se definieron 3 niveles de gravedad [Tabla 1].
Para predecir dicha gravedad, se desarrolló un algoritmo de inteligencia artificial (IA), tipo ‘Random Forest’ con balanceo y ajuste automático de sus parámetros internos. El algoritmo se entrenó y evaluó mediante 20 repeticiones de validación cruzada 10-fold (90% entrenamiento, 10% validación), estratificando aleatoriamente por hospital y gravedad.
Resultados
En los conjuntos de validación, el algoritmo alcanzó una capacidad predictiva (área bajo la curva ROC) promedio AUC=0.834 para gravedad nivel 0, AUC=0.724 para 1 y AUC=0.850 para 2 [Figura 1]. Sin la información de contaminantes, su capacidad predictiva se degradó ligeramente (AUCs = 0.829, 0.722, 0.844; respectivamente).
Conclusiones
Nuestro algoritmo IA es capaz de predecir de manera satisfactoria la evolución de la gravedad en la neumonÃa; en particular para los casos más leves y más severos. El algoritmo IA extrae las reglas más relevantes a partir principalmente de la información clÃnica, analÃtica y radiológica de cada individuo; no obstante, la incorporación de la exposición a contaminantes mejora ligeramente la capacidad predictiva. El impacto de la contaminación podrÃa estar ya reflejado en las analÃticas de sangre, a través de su efecto en los
niveles de inflamación del paciente (PCT, PCR, LDH, etc.)