39 research outputs found

    Derivation of a Cost-Sensitive COVID-19 Mortality Risk Indicator Using a Multistart Framework

    Get PDF
    The overall global death rate for COVID-19 patients has escalated to 2.13% after more than a year of worldwide spread. Despite strong research on the infection pathogenesis, the molecular mechanisms involved in a fatal course are still poorly understood.Machine learning constitutes a perfect tool to develop algorithms for predicting a patient’s hospitalization outcome at triage. This paper presents a probabilistic model, referred to as a mortality risk indicator, able to assess the risk of a fatal outcome for new patients. The derivation of the model was done over a database of 2,547 patients from the first COVID-19 wave in Spain. Model learning was tackled through a five multistart configuration that guaranteed good generalization power and low variance error estimators. The training algorithm made use of a class weighting correction to account for the mortality class imbalance and two regularization learners, logistic and lasso regressors. Outcome probabilities were adjusted to obtain cost-sensitive predictions by minimizing the type II error. Our mortality indicator returns both a binary outcome and a threestage mortality risk level. The estimated AUC across multistarts reaches an average of 0.907. At the optimal cutoff for the binary outcome, the model attains an average sensitivity of 0.898, with a 0.745 specificity. An independent set of 121 patients later released from the same consortium attained perfect sensitivity (1), with a 0.759 specificity when predicted by our model. Best performance for the indicator is achieved when the prediction’s time horizon is within two weeks since admission to hospital. In addition to a strong predictive performance, the set of selected features highlights the relevance of several underrated molecules in COVID-19 research, such as blood eosinophils, bilirubin, and urea levels.AXA Research Fund project "Early prognosis of COVID-19 infections via machine learning" Basque Government special funding on Mathematical Modelling Applied to Healt

    Learning a Battery of COVID-19 Mortality Prediction Models by Multi-objective Optimization

    Get PDF
    The COVID-19 pandemic is continuously evolving with drastically changing epidemiological situations which are approached with different decisions: from the reduction of fatalities to even the selection of patients with the highest probability of survival in critical clinical situations. Motivated by this, a battery of mortality prediction models with different performances has been developed to assist physicians and hospital managers. Logistic regression, one of the most popular classifiers within the clinical field, has been chosen as the basis for the generation of our models. Whilst a standard logistic regression only learns a single model focusing on improving accuracy, we propose to extend the possibilities of logistic regression by focusing on sensitivity and specificity. Hence, the log-likelihood function, used to calculate the coefficients in the logistic model, is split into two objective functions: one representing the survivors and the other for the deceased class. A multi-objective optimization process is undertaken on both functions in order to find the Pareto set, composed of models not improved by another model in both objective functions simultaneously. The individual optimization of either sensitivity (deceased patients) or specificity (survivors) criteria may be conflicting objectives because the improvement of one can imply the worsening of the other. Nonetheless, this conflict guarantees the output of a battery of diverse prediction models. Furthermore, a specific methodology for the evaluation of the Pareto models is proposed. As a result, a battery of COVID-19 mortality prediction models is obtained to assist physicians in decision-making for specific epidemiological situations.This research is supported by the Basque Government (IT1504- 22, Elkartek) through the BERC 2022–2025 program and BMTF project, and by the Ministry of Science, Innovation and Universities: BCAM Severo Ochoa accreditation SEV-2017-0718 and PID2019-104966GB-I00. Furthermore, the work is also supported by the AXA Research Fund project “Early prognosis of COVID-19 infections via machine learning”

    Identification of a biomarker panel for colorectal cancer diagnosis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries.</p> <p>Methods</p> <p>A genomic study of human colorectal cancer has been carried out on a total of 31 tumoral samples, corresponding to different stages of the disease, and 33 non-tumoral samples. The study was carried out by hybridisation of the tumour samples against a reference pool of non-tumoral samples using Agilent Human 1A 60-mer oligo microarrays. The results obtained were validated by qRT-PCR. In the subsequent bioinformatics analysis, gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling were built. The consensus among all the induced models produced a hierarchy of dependences and, thus, of variables.</p> <p>Results</p> <p>After an exhaustive process of pre-processing to ensure data quality--lost values imputation, probes quality, data smoothing and intraclass variability filtering--the final dataset comprised a total of 8, 104 probes. Next, a supervised classification approach and data analysis was carried out to obtain the most relevant genes. Two of them are directly involved in cancer progression and in particular in colorectal cancer. Finally, a supervised classifier was induced to classify new unseen samples.</p> <p>Conclusions</p> <p>We have developed a tentative model for the diagnosis of colorectal cancer based on a biomarker panel. Our results indicate that the gene profile described herein can discriminate between non-cancerous and cancerous samples with 94.45% accuracy using different supervised classifiers (AUC values in the range of 0.997 and 0.955).</p

    A review of estimation of distribution algorithms in bioinformatics

    Get PDF
    Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain

    Discretization of expression quantitative trait loci in association analysis between genotypes and expression data

    Get PDF
    Expression quantitative trait loci are used as a tool to identify genetic causes of natural variation in gene expression. Only in a few cases the expression of a gene is controlled by a variant on a single genetic marker. There is a plethora of different complexity levels of interaction effects within markers, within genes and between marker and genes. This complexity challenges biostatisticians and bioinformatitians every day and makes findings difficult to appear. As a way to simplify analysis and better control confounders, we tried a new approach for association analysis between genotypes and expression data. We pursued to understand whether discretization of expression data can be useful in genome-transcriptome association analyses. By discretizing the dependent variable, algorithms for learning classifiers from data as well as performing block selection were used to help understanding the relationship between the expression of a gene and genetic markers. We present the results of using this approach to detect new possible causes of expression variation of DRB5, a gene playing an important role within the immune system. Together with expression of gene DRB5 obtained from the classical microarray technology, we have also measured DRB5 expression by using the more recent next-generation sequencing technology. A supplementary website including a link to the software with the method implemented can be found at http: //bios.ugr.es/DRB5

    Regularized logistic regression and multi-objective variable selection for classifying MEG data

    Get PDF
    This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori

    What is behind a summary-evaluation decision?

    Get PDF
    Research in psychology has reported that, among the variety of possibilities for assessment methodologies, summary evaluation offers a particularly adequate context for inferring text comprehension and topic understanding. However, grades obtained in this methodology are hard to quantify objectively. Therefore, we carried out an empirical study to analyze the decisions underlying human summary-grading behavior. The task consisted of expert evaluation of summaries produced in critically relevant contexts of summarization development, and the resulting data were modeled by means of Bayesian networks using an application called Elvira, which allows for graphically observing the predictive power (if any) of the resultant variables. Thus, in this article, we analyzed summary-evaluation decision making in a computational framewor

    Gene Expression Profiling in Limb-Girdle Muscular Dystrophy 2A

    Get PDF
    Limb-girdle muscular dystrophy type 2A (LGMD2A) is a recessive genetic disorder caused by mutations in calpain 3 (CAPN3). Calpain 3 plays different roles in muscular cells, but little is known about its functions or in vivo substrates. The aim of this study was to identify the genes showing an altered expression in LGMD2A patients and the possible pathways they are implicated in. Ten muscle samples from LGMD2A patients with in which molecular diagnosis was ascertained were investigated using array technology to analyze gene expression profiling as compared to ten normal muscle samples. Upregulated genes were mostly those related to extracellular matrix (different collagens), cell adhesion (fibronectin), muscle development (myosins and melusin) and signal transduction. It is therefore suggested that different proteins located or participating in the costameric region are implicated in processes regulated by calpain 3 during skeletal muscle development. Genes participating in the ubiquitin proteasome degradation pathway were found to be deregulated in LGMD2A patients, suggesting that regulation of this pathway may be under the control of calpain 3 activity. As frizzled-related protein (FRZB) is upregulated in LGMD2A muscle samples, it could be hypothesized that β-catenin regulation is also altered at the Wnt signaling pathway, leading to an incorrect myogenesis. Conversely, expression of most transcription factor genes was downregulated (MYC, FOS and EGR1). Finally, the upregulation of IL-32 and immunoglobulin genes may induce the eosinophil chemoattraction explaining the inflammatory findings observed in presymptomatic stages. The obtained results try to shed some light on identification of novel therapeutic targets for limb-girdle muscular dystrophies

    Unveiling relevant non-motor Parkinson’s disease severity symptoms using a machine learning approach

    Full text link
    Objective: Is it possible to predict the severity staging of a Parkinson’s disease (PD) patient using scores of non-motor symptoms? This is the kickoff question for a machine learning approach to classify two widely known PD severity indexes using individual tests from a broad set of non-motor PD clinical scales only. Methods: The Hoehn & Yahr index and clinical impression of severity index are global measures of PD severity. They constitute the labels to be assigned in two supervised classification problems using only non-motor symptom tests as predictor variables. Such predictors come from a wide range of PD symptoms, such as cognitive impairment, psychiatric complications, autonomic dysfunction or sleep disturbance. The classification was coupled with a feature subset selection task using an advanced evolutionary algorithm, namely an estimation of distribution algorithm. Results: Results show how five different classification paradigms using a wrapper feature selection scheme are capable of predicting each of the class variables with estimated accuracy in the range of 72–92%. In addition, classification into the main three severity categories (mild, moderate and severe) was split into dichotomic problems where binary classifiers perform better and select different subsets of non-motor symptoms. The number of jointly selected symptoms throughout the whole process was low, suggesting a link between the selected non-motor symptoms and the general severity of the disease. Conclusion: Quantitative results are discussed from a medical point of view, reflecting a clear translation to the clinical manifestations of PD. Moreover, results include a brief panel of non-motor symptoms that could help clinical practitioners to identify patients who are at different stages of the disease from a limited set of symptoms, such as hallucinations, fainting, inability to control body sphincters or believing in unlikely facts
    corecore