Search CORE

12 research outputs found

Features selected in ⩾80% cases by the stable MI filters: nFS = 20 or 40.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

(a) MI Classif—knn imputer: nFS = 20. (b) MI Regress—knn imputer: nFS = 20. (c) MI Classif—iterat imputer: nFS = 20. (d) MI Regress—iterat imputer: nFS = 20. (e) MI Classif—knn imputer: nFS = 40. (f) MI Regress—knn imputer: nFS = 40. (g) MI Classif—iterat imputer: nFS = 40. (h) MI Regress—iterat imputer: nFS = 40.</p

FigShare

Stability for the filter algorithms.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study, enrolling n = 1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing ⩾60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these became d = 148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (⩾0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient’s C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels –saturation Sp O2, quotients Sp O2/RR and arterial Sat O2/Fi O2–, the neutrophil-to-lymphocyte ratio (NLR) –to certain extent, also neutrophil and lymphocyte counts separately–, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood. A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives.</div

FigShare

Top-20 selected features.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

Out of the 21 stable FS configurations, how many of them selected a certain feature for ⩾80% of the M = 100 bootstrap iterations.</p

FigShare

Features selected in ⩾80% cases by the stable RBA filters: All of them without imputation.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

(a) ReliefF (k = 1 00): nFS = 5. (b) MultiSURF: nFS = 5. (c) ReliefF (k = 100): nFS = 10. (d) MultiSURF: nFS = 10. (e) ReliefF (k = 100): nFS = 20. (f) MultiSURF; nFS = 20. (g) ReliefF (k = 100): nFS = 40. (h) MultiSURF: nFS = 40.</p

FigShare

Number of selected features for algorithms with non-fixed nFS.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

Number of selected features for algorithms with non-fixed nFS.</p

FigShare

On-line supplementary materials.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

Report—(S.A) Data: List of features, (S.B) Data: Cohort characteristics, (S.C) Methods: Hyperparameters, (S.D) Results: Stability, (S.E) Results: Computation times. (PDF)</p

FigShare

Stability for the wrapper algorithms.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

FigShare

Jaccard similarity index between feature subsets.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

For all pairs of stable algorithms, these grouped by nFS specification. Results were averaged over M = 100 bootstrap samples. (a) nFS = 5. (b) nFS = 10. (c) nFS = 20. (d) nFS = 40. (e) nFS not pre-fixed.</p

FigShare

Features selected in ⩾80% cases by the stable RFE wrappers (a,b) and embeddeds (c–e): All of them with the knn imputer.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

(a) RFE: nFS = 5. (b) RFE: nFS = 20. (c) L1-LR: C = 0.005. (d) Lasso: α = 0.050. (e) Lasso: α = 0.075.</p

FigShare

Flow chart for the included and excluded variables, and feature encoding.

Author: Antoni Torres (238439)
Dae-Jin Lee (12225080)
Fernando García-García (15285203)
Inmaculada Arostegui (35360)
Isabel Urrutia Landa (15285209)
Joaquín Martínez-Minaya (9242480)
José María Quintana (9244111)
Miren Hayet-Otero (15285200)
Mónica Nieves Ermecheo (15285212)
Pedro Pablo España Yandiola (15285206)
Rafael Zalacain Jorge (15285215)
Rosario Menéndez (322031)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/04/2023
Field of study

Flow chart for the included and excluded variables, and feature encoding.</p

FigShare

Features selected in ⩾80% cases by the stable MI filters: <i>n</i><sub><i>FS</i></sub> = 20 or 40.

Stability for the filter algorithms.

Top-20 selected features.

Features selected in ⩾80% cases by the stable RBA filters: All of them without imputation.

Number of selected features for algorithms with non-fixed <i>n</i><sub><i>FS</i></sub>.

On-line supplementary materials.

Stability for the wrapper algorithms.

Jaccard similarity index between feature subsets.

Features selected in ⩾80% cases by the stable RFE wrappers (<i>a,b</i>) and embeddeds (<i>c–e</i>): All of them with the <i>k</i>nn imputer.

Flow chart for the included and excluded variables, and feature encoding.