15 research outputs found

    Comparing the Min–Max–Median/IQR Approach with the Min–Max Approach, Logistic Regression and XGBoost, maximising the Youden index

    Get PDF
    Although linearly combining multiple variables can provide adequate diagnostic performance, certain algorithms have the limitation of being computationally demanding when the number of variables is sufficiently high. Liu et al. proposed the min–max approach that linearly combines the minimum and maximum values of biomarkers, which is computationally tractable and has been shown to be optimal in certain scenarios. We developed the Min–Max–Median/IQR algorithm under Youden index optimisation which, although more computationally intensive, is still approachable and includes more information. The aim of this work is to compare the performance of these algorithms with well-known Machine Learning algorithms, namely logistic regression and XGBoost, which have proven to be efficient in various fields of applications, particularly in the health sector. This comparison is performed on a wide range of different scenarios of simulated symmetric or asymmetric data, as well as on real clinical diagnosis data sets. The results provide useful information for binary classification problems of better algorithms in terms of performance depending on the scenario

    Detección de Anomalías en Series Temporales

    Get PDF
    La detección de anomalías es uno de los temas más populares en el mundo de la ciencia de datos por sus múltiples aplicaciones prácticas. En concreto, el estudio de anomalías en series temporales es un problema ampliamente investigado y desarrollado a lo largo de la historia, nutriéndose tanto de técnicas estadísticas como de los algoritmos de aprendizaje automático y profundo que han ido surgiendo con los años. Sin embargo, existen muy pocos trabajos en la literatura que comparen técnicas de detección de anomalías procedentes de métodos estadísticos, algoritmos de aprendizaje automático y algoritmos de aprendizaje profundo. Por eso, en este trabajo, se analizan y desarrollan algoritmos procedentes de los tres campos mencionados anteriormente, complementados con transformaciones de las series temporales, con el objetivo de analizar la efectividad de cada algoritmo para diversas situaciones y anomalías. El análisis del desempeño de cada algoritmo se realizará a través de conjuntos de series temporales públicos con anomalías identificadas y pertenecientes a distintas categorías; desarrollando un formalismo matemático válido para llevar a cabo dicha tarea, y utilizando métricas capaces de representar adecuadamente el desempeño de los modelos de detección de anomalías.<br /

    A stepwise algorithm for linearly combining biomakers under Youden Index maximisation

    Get PDF
    Combining multiple biomarkers to provide predictive models with a greater discriminatory ability is a discipline that has received attention in recent years. Choosing the probability threshold that corresponds to the highest combined marker accuracy is key in disease diagnosis. The Youden index is a statistical metric that provides an appropriate synthetic index for diagnostic accuracy and a good criterion for choosing a cut-off point to dichotomize a biomarker. In this study, we present a new stepwise algorithm for linearly combining continuous biomarkers to maximize the Youden index. To investigate the performance of our algorithm, we analyzed a wide range of simulated scenarios and compared its performance with that of five other linear combination methods in the literature (a stepwise approach introduced by Yin and Tian, the min-max approach, logistic regression, a parametric approach under multivariate normality and a non-parametric kernel smoothing approach). The obtained results show that our proposed stepwise approach showed similar results to other algorithms in normal simulated scenarios and outperforms all other algorithms in non-normal simulated scenarios. In scenarios of biomarkers with the same means and a different covariance matrix for the diseased and non-diseased population, the min-max approach outperforms the rest. The methods were also applied on two real datasets (to discriminate Duchenne muscular dystrophy and prostate cancer), whose results also showed a higher predictive ability in our algorithm in the prostate cancer databas

    Incorporating a New Summary Statistic into the Min–Max Approach: A Min–Max–Median, Min–Max–IQR Combination of Biomarkers for Maximising the Youden Index

    Get PDF
    Linearly combining multiple biomarkers is a common practice that can provide a better diagnostic performance. When the number of biomarkers is sufficiently high, a computational burden problem arises. Liu et al. proposed a distribution-free approach (min–max approach) that linearly combines the minimum and maximum values of the biomarkers, involving only a single coefficient search. However, the combination of minimum and maximum biomarkers alone may not be sufficient in terms of discrimination. In this paper, we propose a new approach that extends that of Liu et al. by incorporating a new summary statistic, specifically, the median or interquartile range (min–max–median and min–max–IQR approaches) in order to find the optimal combination that maximises the Youden index. Although this approach is more computationally intensive than the one proposed by Liu et al, it includes more information and the number of parameters to be estimated remains reasonable. We compare the performance of the proposed approaches (min–max–median and min–max–IQR) with the min–max approach and logistic regression. For this purpose, a wide range of different simulated data scenarios were explored. We also apply the approaches to two real datasets (Duchenne Muscular Dystrophy and Small for Gestational Age)

    Estudio de nuevos modelos de Deep Learning para el análisis y comprensión de grandes cantidades de datos

    Get PDF
    Las redes neuronales se están consolidando como método de resolución de problemas de difícil modelización: desde comportamiento predictivo, hasta modelos del lenguaje. En este trabajo se va a estudiar qué es una red neuronal y su modelización matemática, poniendo especial hincapié en las redes orientadas al tratamiento del lenguaje. Se explicarán los modelos Encoder-Decoder, junto a los módulos Transform, estudiando los más populares actualmente y comparando sus resultados en diversas tareas, finalizando con el estudio más completo de un modelo.<br /

    Analysis and Curation of the Database of a Colo-Rectal Cancer Screening Program

    Get PDF
    Data collection in health programs databases is prone to errors that might hinder its use to identify risk indicators and to support optimal decision making in health services. This is the case, in colo-rectal cancer (CRC) screening programs, when trying to optimize the cut-off point to select the patients who will undergo a colonoscopy, especially when having insufficient offer of colonoscopies or temporary excessive demand. It is necessary therefore to establish “good practice” guidelines for data collection, management and analysis. With the aim of improving the redesign of a regional CRC screening program platform, we performed an exhaustive analysis of the data collected, proposing a set of recommendations for its correct maintenance. We also carried out the curation of the available data in order to finally have a clean source of information that would allow proper future analyses. We present here the result of such study, showing the importance of the design of the database and of the user interface to avoid redundancies keeping consistency and checking known correlations, with the final aim of providing quality data that permit to take correct decisions

    Big data and machine learning to improve european grapevine moth (Lobesia botrana) predictions

    Get PDF
    Machine Learning (ML) techniques can be used to convert Big Data into valuable information for agri-environmental applications, such as predictive pest modeling. Lobesia botrana (Denis &amp; Schiffermüller) 1775 (Lepidoptera: Tortricidae) is one of the main pests of grapevine, causing high productivity losses in some vineyards worldwide. This work focuses on the optimization of the Touzeau model, a classical correlation model between temperature and L. botrana development using data-driven models. Data collected from field observations were combined with 30 GB of registered weather data updated every 30 min to train the ML models and make predictions on this pest’s flights, as well as to assess the accuracy of both Touzeau and ML models. The results obtained highlight a much higher F1 score of the ML models in comparison with the Touzeau model. The best-performing model was an artificial neural network of four layers, which considered several variables together and not only the temperature, taking advantage of the ability of ML models to find relationships in nonlinear systems. Despite the room for improvement of artificial intelligence-based models, the process and results presented herein highlight the benefits of ML applied to agricultural pest management strategies

    A clinical decision web to predict ICU admission or death for patients hospitalised with Covid-19 using machine learning algorithms

    Get PDF
    The purpose of the study was to build a predictive model for estimating the risk of ICU admission or mortality among patients hospitalized with COVID-19 and provide a user-friendly tool to assist clinicians in the decision-making process. The study cohort comprised 3623 patients with confirmed COVID-19 who were hospitalized in the SALUD hospital network of Aragon (Spain), which includes 23 hospitals, between February 2020 and January 2021, a period that includes several pandemic waves. Up to 165 variables were analysed, including demographics, comorbidity, chronic drugs, vital signs, and laboratory data. To build the predictive models, different techniques and machine learning (ML) algorithms were explored: multilayer perceptron, random forest, and extreme gradient boosting (XGBoost). A reduction dimensionality procedure was used to minimize the features to 20, ensuring feasible use of the tool in practice. Our model was validated both internally and externally. We also assessed its calibration and provide an analysis of the optimal cut-off points depending on the metric to be optimized. The best performing algorithm was XGBoost. The final model achieved good discrimination for the external validation set (AUC = 0.821, 95% CI 0.787–0.854) and accurate calibration (slope = 1, intercept = −0.12). A cut-off of 0.4 provides a sensitivity and specificity of 0.71 and 0.78, respectively. In conclusion, we built a risk prediction model from a large amount of data from several pandemic waves, which had good calibration and discrimination ability. We also created a user-friendly web application that can aid rapid decision-making in clinical practice

    Changes in severity, mortality, and virus genome among a Spanish cohort of patients hospitalized with SARS-CoV-2

    Get PDF
    Comparing pandemic waves could aid in understanding the evolution of COVID-19. The objective of the present study was to compare the characteristics and outcomes of patients hospitalized for COVID-19 in different pandemic waves in terms of severity and mortality. We performed an observational retrospective cohort study of 5,220 patients hospitalized with SARS-CoV-2 infection from February to September 2020 in Aragon, Spain. We compared ICU admissions and 30-day mortality, clinical characteristics, and risk factors of the first and second waves of COVID-19. The SARS-CoV-2 genome was also analyzed in 236 samples. Patients in the first wave (n¿=¿2,547) were older (median age 74 years [IQR 60–86] vs. 70 years [53–85]; p¿&lt;¿0.001) and had worse clinical and analytical parameters related to severe COVID-19 than patients in the second wave (n¿=¿2,673). The probability of ICU admission at 30 days was 16% and 10% (p¿&lt;¿0.001) and the cumulative 30-day mortality rates 38% and 32% in the first and second wave, respectively (p¿=¿0.007). Survival differences were observed among patients aged 60 to 80 years. We also found some variability among death risk factors and the viral genome between waves. Therefore, the two analyzed COVID-19 pandemic waves were different in terms of disease severity and mortality

    All Roads Lead to Rome: Results of Non-Invasive Respiratory Therapies Applied in a Tertiary-Care Hospital Without an Intermediate Care Unit During the COVID-19 Pandemic

    Get PDF
    Introducción. Las terapias respiratorias no invasivas (TRNI) fueron ampliamente utilizadas en la primera ola de la pandemia de COVID-19, en escenarios distintos según los medios disponibles. El objetivo fue presentar la supervivencia a 90 días y los factores asociados a esta de los pacientes tratados con TRNI en un centro de tercer nivel sin Unidad de Cuidados Respiratorios Intermedios. Como objetivo secundario comparar los resultados obtenidos de las distintas terapias. Métodos. Estudio observacional de pacientes tratados con TRNI fuera de un ambiente de Cuidados Intensivos o Unidad de Cuidados Respiratorios Intermedios, diagnosticados de COVID-19 y con síndrome de distrés respiratorio agudo por criterios radiológicos y de ratio SpO2/FiO2. Se desarrolló un modelo multivariante de regresión logística para determinar las variables independientemente asociadas, y se compararon los resultados de la terapia de alto flujo con cánula nasal y la presión positiva continua en la vía aérea. Resultados. Se trataron 107 pacientes y sobrevivieron 85 (79,4%) a los 90 días. Antes de iniciar la TRNI el ratio medio de SpO2/FiO2 fue de 119,8±59,4. Un mayor score de SOFA se asoció significativamente a la mortalidad (OR 2,09; IC95% 1,34 – 3,27), mientras que la autopronación fue un factor protector (OR 0,23; IC95% 0,06 – 0,91). La terapia de alto flujo con cánula nasal fue utilizada en 63 sujetos (58,9%), y la presión positiva continua en la vía aérea en 41 (38,3%). No se encontraron diferencias entre ellas. Conclusión. Aproximadamente cuatro de cada cinco pacientes tratados con TRNI sobrevivieron a los 90 días, y no se encontraron diferencias significativas entre la terapia de alto flujo con cánula nasal y la presión positiva continua en la vía aérea.S
    corecore