14 research outputs found
Recommended from our members
Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients
We introduce a machine-learning framework and field-deployable app to predict outcome of Ebola patients from their initial clinical symptoms. Recent work from other authors also points out to the clinical factors that can be used to better understand patient prognosis, but there is currently no predictive model that can be deployed in the field to assist health care workers. Mobile apps for clinical diagnosis and prognosis allow using more complex models than the scoring protocols that have been traditionally favored by clinicians, such as Apgar and MTS. Furthermore, the WHO Ebola Interim Assessment Panel has recently concluded that innovative tools for data collection, reporting, and monitoring are needed for better response in future outbreaks. However, incomplete clinical data will continue to be a serious problem until more robust and standardized data collection systems are in place. Our app demonstrates how systematic data collection could lead to actionable knowledge, which in turn would trigger more and better collection, further improving the prognosis models and the app, essentially creating a virtuous cycle.Organismic and Evolutionary Biolog
Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data
The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML
Identification of a Transcriptomic Prognostic Signature by Machine Learning Using a Combination of Small Cohorts of Prostate Cancer
International audienceDetermining which treatment to provide to men with prostate cancer (PCa) is a major challenge for clinicians. Currently, the clinical risk-stratification for PCa is based on clinico-pathological variables such as Gleason grade, stage and prostate specific antigen (PSA) levels. But transcriptomic data have the potential to enable the development of more precise approaches to predict evolution of the disease. However, high quality RNA sequencing (RNA-seq) datasets along with clinical data with long follow-up allowing discovery of biochemical recurrence (BCR) biomarkers are small and rare. In this study, we propose a machine learning approach that is robust to batch effect and enables the discovery of highly predictive signatures despite using small datasets. Gene expression data were extracted from three RNA-Seq datasets cumulating a total of 171 PCa patients. Data were re-analyzed using a unique pipeline to ensure uniformity. Using a machine learning approach, a total of 14 classifiers were tested with various parameters to identify the best model and gene signature to predict BCR. Using a random forest model, we have identified a signature composed of only three genes (JUN, HES4, PPDPF) predicting BCR with better accuracy [74.2%, balanced error rate (BER) = 27%] than the clinico-pathological variables (69.2%, BER = 32%) currently in use to predict PCa evolution. This score is in the range of the studies that predicted BCR in single-cohort with a higher number of patients. We showed that it is possible to merge and analyze different small and heterogeneous datasets altogether to obtain a better signature than if they were analyzed individually, thus reducing the need for very large cohorts. This study demonstrates the feasibility to regroup different small datasets in one larger to identify a predictive genomic signature that would benefit PCa patients
Investigation of the Genus Flavobacterium as a Reservoir for Fish-Pathogenic Bacterial Species: the Case of Flavobacterium collinsii
International audienceBacteria of the genus Flavobacterium are recovered from a large variety of environments. Among the described species, Flavobacterium psychrophilum and Flavobacterium columnare cause considerable losses in fish farms. Alongside these well-known fish-pathogenic species, isolates belonging to the same genus recovered from diseased or apparently healthy wild, feral, and farmed fish have been suspected to be pathogenic. Here, we report the identification and genomic characterization of a Flavobacterium collinsii isolate (TRV642) retrieved from rainbow trout spleen. A phylogenetic tree of the genus built by aligning the core genome of 195 Flavobacterium species revealed that F. collinsii stands within a cluster of species associated with diseased fish, the closest one being F. tructae, which was recently confirmed as pathogenic. We evaluated the pathogenicity of F. collinsii TRV642 as well as of Flavobacterium bernardetii F-372T, another recently described species reported as a possible emerging pathogen. Following intramuscular injection challenges in rainbow trout, no clinical signs or mortalities were observed with F. bernardetii. F. collinsii showed very low virulence but was isolated from the internal organs of survivors, indicating that the bacterium is able to survive inside the host and may provoke disease in fish under compromised conditions such as stress and/or wounds. Our results suggest that members of a phylogenetic cluster of fish-associated Flavobacterium species may be opportunistic fish pathogens causing disease under specific circumstances. IMPORTANCE Aquaculture has expanded significantly worldwide in the last decades and accounts for half of human fish consumption. However, infectious fish diseases are a major bottleneck for its sustainable development, and an increasing number of bacterial species from diseased fish raise a great concern. The current study revealed phylogenetic associations with ecological niches among the Flavobacterium species. We also focused on Flavobacterium collinsii, which belongs to a group of putative pathogenic species. The genome contents revealed a versatile metabolic repertoire suggesting the use of diverse nutrient sources, a characteristic of saprophytic or commensal bacteria. In a rainbow trout experimental challenge, the bacterium survived inside the host, likely escaping clearance by the immune system but without provoking massive mortality, suggesting opportunistic pathogenic behavior. This study highlights the importance of experimentally evaluating the pathogenicity of the numerous bacterial species retrieved from diseased fi
Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data
The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML
Prediction of lipomatous soft tissue malignancy on MRI: comparison between machine learning applied to radiomics and deep learning
International audienceAbstract Objectives Malignancy of lipomatous soft-tissue tumours diagnosis is suspected on magnetic resonance imaging (MRI) and requires a biopsy. The aim of this study is to compare the performances of MRI radiomic machine learning (ML) analysis with deep learning (DL) to predict malignancy in patients with lipomas oratypical lipomatous tumours. Methods Cohort include 145 patients affected by lipomatous soft tissue tumours with histology and fat-suppressed gadolinium contrast-enhanced T1-weighted MRI pulse sequence. Images were collected between 2010 and 2019 over 78 centres with non-uniform protocols (three different magnetic field strengths (1.0, 1.5 and 3.0 T) on 16 MR systems commercialised by four vendors (General Electric, Siemens, Philips, Toshiba)). Two approaches have been compared: (i) ML from radiomic features with and without batch correction; and (ii) DL from images. Performances were assessed using 10 cross-validation folds from a test set and next in external validation data. Results The best DL model was obtained using ResNet50 (resulting into an area under the curve (AUC) of 0.87 ± 0.11 (95% CI 0.65â1). For ML/radiomics, performances reached AUCs equal to 0.83 ± 0.12 (95% CI 0.59â1) and 0.99 ± 0.02 (95% CI 0.95â1) on test cohort using gradient boosting without and with batch effect correction, respectively. On the external cohort, the AUC of the gradient boosting model was equal to 0.80 and for an optimised decision threshold sensitivity and specificity were equal to 100% and 32% respectively. Conclusions In this context of limited observations, batch-effect corrected ML/radiomics approaches outperformed DL-based models
Transcriptome architecture and regulation at environmental transitions in flavobacteria: the case of an important fish pathogen
International audienceThe family Flavobacteriaceae (phylum Bacteroidetes ) is a major component of soil, marine and freshwater ecosystems. In this understudied family, Flavobacterium psychrophilum is a freshwater pathogen that infects salmonid fish worldwide, with critical environmental and economic impact. Here, we report an extensive transcriptome analysis that established the genome map of transcription start sites and transcribed regions, predicted alternative sigma factor regulons and regulatory RNAs, and documented gene expression profiles across 32 biological conditions mimicking the pathogen life cycle. The results link genes to environmental conditions and phenotypic traits and provide insights into gene regulation, highlighting similarities with better known bacteria and original characteristics linked to the phylogenetic position and the ecological niche of the bacterium. In particular, osmolarity appears as a signal for transition between free-living and within-host programs and expression patterns of secreted proteins shed light on probable virulence factors. Further investigations showed that a newly discovered sRNA widely conserved in the genus, Rfp18, is required for precise expression of proteases. By pointing proteins and regulatory elements probably involved in hostâpathogen interactions, metabolic pathways, and molecular machineries, the results suggest many directions for future research; a website is made available to facilitate their use to fill knowledge gaps on flavobacteria
Immune-focused multi-omics analysis of prostate cancer: leukocyte Ig-Like receptors are associated with disease progression
International audienceProstate cancer (PCa) immunotherapy has shown limited efficacy so far, even in advanced-stage cancers. The success rate of PCa immunotherapy might be improved by approaches more adapted to the immunobiology of the disease. The objective of this study was to perform a multi-omics analysis to identify immune genes associated with PCa progression to better characterize PCa immunobiology and propose new immunotherapeutic targets. mRNA, miRNA, methylation, copy number aberration, and single nucleotide variant datasets from The Cancer Genome Atlas PRAD cohort were analyzed after filtering for genes associated with immunity. Sparse partial least squares-discriminant analyses were performed to identify features associated with biochemical recurrence (BCR) in each type of omics data. Selected features predicted BCR with a balanced error rate (BER) of 0.20 to 0.51 in single-omics and of 0.05 in multi-omics analyses. Amongst features associated with BCR were genes from the Immunoglobulin Ig-like Receptor (LILR) family which are immune checkpoints with immunotherapeutic potential. Using Multivariate INTegrative (MINT) analysis, the association of five LILR genes with BCR was quantified in a combination of three RNA-seq datasets and confirmed with Kaplan-Meier analysis in both these and in an independent RNA-seq dataset. Finally, immunohistochemistry showed that a high number of LILRB1 positive cells within the tumors predicted long-term adverse outcomes. Thus, tumors characterized by abnormal expression of LILR genes have an elevated risk of recurring after definitive local therapy. The immunotherapeutic potential of these regulators to stimulate the immune response against PCa should be evaluated in pre-clinical models