12 research outputs found

    A synthetic dataset of liver disorder patients

    No full text
    The data in this article include 10,000 synthetic patients with liver disorders, characterized by 70 different variables, including clinical features, and patient outcomes, such as hospital admission or surgery. Patient data are generated, simulating as close as possible real patient data, using a publicly available Bayesian network describing a casual model for liver disorders. By varying the network parameters, we also generated an additional set of 500 patients with characteristics that deviated from the initial patient population. We provide an overview of the synthetic data generation process and the associated scripts for generating the cohorts. This dataset can be useful for the machine learning models training and validation, especially under the effect of dataset shift between training and testing sets

    Predicting emerging SARS-CoV-2 variants of concern through a One Class dynamic anomaly detection algorithm

    No full text
    Objectives The objective of this study is the implementation of an automatic procedure to weekly detect new SARS-CoV-2 variants and non-neutral variants (variants of concern (VOC) and variants of interest (VOI)).Methods We downloaded spike protein primary sequences from the public resource GISAID and we represented each sequence as k-mer counts. For each week since 1 July 2020, we evaluate if each sequence represents an anomaly based on a One Class support vector machine (SVM) classification algorithm trained on neutral protein sequences collected from February to June 2020.Results We assess the ability of the One Class classifier to detect known VOC and VOI, such as Alpha, Delta or Omicron, ahead of their official classification by health authorities. In median, the classifier predicts a non-neutral variant as outlier 10 weeks before the official date of designation as VOC/VOI.Discussion The identification of non-neutral variants during a pandemic usually relies on indicators available during time, such as changing population size of a variant. Automatic variant surveillance systems based on protein sequences can enhance the fast identification of variants of potential concern.Conclusion Machine learning, and in particular One Class SVM classification, can support the detection of potentially VOC/VOI variants during an evolving pandemics

    Evaluating pointwise reliability of machine learning prediction

    No full text
    Interest in Machine Learning applications to tackle clinical and biological problems is increasing. This is driven by promising results reported in many research papers, the increasing number of AI-based software products, and by the general interest in Artificial Intelligence to solve complex problems. It is therefore of importance to improve the quality of machine learning output and add safeguards to support their adoption. In addition to regulatory and logistical strategies, a crucial aspect is to detect when a Machine Learning model is not able to generalize to new unseen instances, which may originate from a population distant to that of the training population or from an under-represented subpopulation. As a result, the prediction of the machine learning model for these instances may be often wrong, given that the model is applied outside its “reliable” space of work, leading to a decreasing trust of the final users, such as clinicians. For this reason, when a model is deployed in practice, it would be important to advise users when the model's predictions may be unreliable, especially in high-stakes applications, including those in healthcare. Yet, reliability assessment of each machine learning prediction is still poorly addressed. Here, we review approaches that can support the identification of unreliable predictions, we harmonize the notation and terminology of relevant concepts, and we highlight and extend possible interrelationships and overlap among concepts. We then demonstrate, on simulated and real data for ICU in-hospital death prediction, a possible integrative framework for the identification of reliable and unreliable predictions. To do so, our proposed approach implements two complementary principles, namely the density principle and the local fit principle. The density principle verifies that the instance we want to evaluate is similar to the training set. The local fit principle verifies that the trained model performs well on training subsets that are more similar to the instance under evaluation. Our work can contribute to consolidating work in machine learning especially in medicine

    MALDI mass spectrometry imaging shows a gradual change in the proteome landscape during mouse ovarian folliculogenesis

    No full text
    Our knowledge regarding the role proteins play in the mutual relationship among oocytes, surrounding follicle cells, stroma, and the vascular network inside the ovary is still poor and obtaining insights into this context would significantly aid our understanding of folliculogenesis. Here, we describe a spatial proteomics approach to characterize the proteome of individual follicles at different growth stages in a whole prepubertal 25-day-old mouse ovary. A total of 401 proteins were identified by nano-scale liquid chromatography-electrospray ionization-tandem mass spectrometry (nLC-ESI-MS/MS), 69 with a known function in ovary biology, as demonstrated by earlier proteomics studies. Enrichment analysis highlighted significant KEGG and Reactome pathways, with apoptosis, developmental biology, PI3K-Akt, epigenetic regulation of gene expression, and extracellular matrix organization being well represented. Then, correlating these data with the spatial information provided by matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI) on 276 follicles enabled the protein profiles of single follicle types to be mapped within their native context, highlighting 94 proteins that were detected throughout the secondary to the pre-ovulatory transition. Statistical analyses identified a group of 37 proteins that showed a gradual quantitative change during follicle differentiation, comprising 10 with a known role in follicle growth (NUMA1, TPM2), oocyte germinal vesicle-to-metaphase II transition (SFPQ, ACTBL, MARCS, NUCL), ovulation (GELS, CO1A2), and preimplantation development (TIF1B, KHDC3). The proteome landscape identified includes molecules of known function in the ovary, but also those whose specific role is emerging. Altogether, this work demonstrates the utility of performing spatial proteomics in the context of the ovary and offers sound bases for more in-depth investigations that aim to further unravel its spatial proteome

    Cytoplasmic movements of the early human embryo: imaging and artificial intelligence to predict blastocyst development

    No full text
    Research question: Can artificial intelligence and advanced image analysis extract and harness novel information derived from cytoplasmic movements of the early human embryo to predict development to blastocyst? Design: In a proof-of-principle study, 230 human preimplantation embryos were retrospectively assessed using an artificial neural network. After intracytoplasmic sperm injection, embryos underwent time-lapse monitoring for 44 h. For comparison, standard embryo assessment of each embryo by a single embryologist was carried out to predict development to blastocyst stage based on a single picture frame taken at 42 h of development. In the experimental approach, in embryos that developed to blastocyst or destined to arrest, cytoplasm movement velocity was recorded by time-lapse monitoring during the first 44 h of culture and analysed with a Particle Image Velocimetry algorithm to extract quantitative information. Three main artificial intelligence approaches, the k-Nearest Neighbour, the Long-Short Term Memory Neural Network and the hybrid ensemble classifier were used to classify the embryos. Results: Blind operator assessment classified each embryo in terms of ability to develop to blastocyst, with 75.4% accuracy, 76.5% sensitivity, 74.3% specificity, 74.3% precision and 75.4% F1 score. Integration of results from artificial intelligence models with the blind operator classification, resulted in 82.6% accuracy, 79.4% sensitivity, 85.7% specificity, 84.4% precision and 81.8% F1 score. Conclusions: The present study suggests the possibility of predicting human blastocyst development at early cleavage stages by detection of cytoplasm movement velocity and artificial intelligence analysis. This indicates the importance of the dynamics of the cytoplasm as a novel and valuable source of data to assess embryo viability

    CardioVAI: An automatic implementation of ACMG-AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases

    No full text
    Variant interpretation for the diagnosis of genetic diseases is a complex process. The American College of Medical Genetics and Genomics, with the Association for Molecular Pathology, have proposed a set of evidence-based guidelines to support variant pathogenicity assessment and reporting in Mendelian diseases. Cardiovascular disorders are a field of application of these guidelines, but practical implementation is challenging due to the genetic disease heterogeneity and the complexity of information sources that need to be integrated. Decision support systems able to automate variant interpretation in the light of specific disease domains are demanded. We implemented CardioVAI (Cardio Variant Interpreter), an automated system for guidelines based variant classification in cardiovascular-related genes. Different omics-resources were integrated to assess pathogenicity of every genomic variant in 72 cardiovascular diseases related genes. We validated our method on benchmark datasets of high-confident assessed variants, reaching pathogenicity and benignity concordance up to 83 and 97.08%, respectively. We compared CardioVAI to similar methods and analyzed the main differences in terms of guidelines implementation. We finally made available CardioVAI as a web resource (http://cardiovai.engenome.com/) that allows users to further specialize guidelines recommendations
    corecore