9 research outputs found

    L'utilisation des outils bioinformatiques pour caractériser le paysage immunologique du cancer de la prostate

    Get PDF
    Dans le cadre de mon doctorat j'ai développé des approches appliquées d'analyse de données pour effectuer une analyse multi omique du cancer de la prostate (CaP). Mon projet s'est défini en deux parties distinctes correspondant aux deux articles intégrés dans le corps de mon document. Une première partie du travail a consisté à récupérer des données omiques de différents types (RNA-Seq, Methylation, CNA, SNA, miRNA, données cliniques) associées au CaP et à les préparer avec un pipeline bioinformatique adapté. Ensuite j'ai eu pour objectif de chercher à mettre en avant de nouveaux points de contrôles de l'immunité associés à la récidive biochimique (BCR) dans le CaP au travers de ces données. Pour remplir cet objectif j'ai utilisé une approche particulière basée sur des algorithmes d'analyse en composante principale (PCA) et de régression des moindres carrés (PLS). Cela a permis de faire ressortir une famille spécifique de points de contrôle de l'immunité, la famille des LILR, qui peut potentiellement être une famille cible en immunothérapie. Dans un second temps, j'ai utilisé ces mêmes données pour développer un protocole d'analyse d'apprentissage machine (ML). Le but de ce travail était de montrer qu'il était possible de prédire si des patients allaient récidiver ou pas à partir de données RNA-Seq. J'ai montré que même avec des petits jeux de données on pouvait atteindre des scores de prédiction très bon et que les algorithmes actuels de ML prenaient bien en compte la variabilité technique de la diversité des sources de données dans le CaP. Il est donc possible d'utiliser les biobanques actuelles possédées par les structures de recherches à travers le monde pour créer des jeux de données plus importants.As part of my PhD, I developed applied data analysis approaches to perform a multi-omic analysis of prostate cancer (CaP). My project was split into two distinct parts corresponding to the two articles integrated into the body of my document. A first part of the work consisted in recovering omics data of different types (RNA-Seq, Methylation, CNA, SNA, miRNA, clinical data) associated with CaP and preparing them with an adapted bioinformatics pipeline. Then, my goal was to seek to highlight new immunity checkpoints associated with biochemical recurrence (BCR) in CaP through these data. To fulfill this objective, I used a special approach based on Principal Component Analysis (PCA) and Partial Least Squares Regression (PLS) algorithms. This has brought out a specific family of immunity checkpoints, the LILR family, which can potentially be a target family in immunotherapy. Second, I used the same data to develop a machine learning (ML) analysis protocol. The aim of this work was to show that it was possible to predict whether or not patients would relapse from RNA-Seq data. I have shown that even with small datasets, one can achieve very good prediction scores and that current ML algorithms take into account the technical variability of the diverse data sources in the CaP. It is therefore possible to use current biobanks owned by research structures around the world to create larger datasets

    Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data

    Get PDF
    The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML

    Étude comparative de modèles de clustering de séries temporelles multivariées issues d'objets médicaux connectés

    No full text
    National audienceIn healthcare, patient data is often collected as multivariate time series, providing a comprehensive view of a patient’s health status over time. While this data can be sparse, connected devices may enhance its frequency. The goal is to create patient profiles from these time series. In the absence of labels, a predictive model can be used to predict future values while forming a latent cluster space, evaluated based on predictive performance. We compare two models on Withing’s datasets, M AGMAC LUST which clusters entire time series and DGM² which allows the group affiliation of an individual to change over time (dynamic clustering).Dans le domaine de la santé, les données des patients sont souvent collectées sous forme de séries temporelles multivariées, offrant une vue complète de l'état de santé d'un patient au fil du temps. Ces données sont généralement éparses et épisodiques. Cependant, les objets médicaux connectés peuvent augmenter la fréquence des données. L'objectif est de créer de manière non supervisée des profils de patients à partir de ces séries temporelles. En l'absence de labels, un modèle prédictif peut être utilisé pour prédire les valeurs futures tout en formant un espace de clusters latents, évalué en fonction de la performance prédictive. À l'aide des données réelles de l'entreprise Withings, nous comparons les approches de clustering statique MAGMACLUST, qui crée un cluster à l'échelle de toute la série temporelle, et de clustering dynamique DGM², qui permet à l'appartenance d'un individu à un groupe de changer avec le temps

    Identification of a Transcriptomic Prognostic Signature by Machine Learning Using a Combination of Small Cohorts of Prostate Cancer

    No full text
    International audienceDetermining which treatment to provide to men with prostate cancer (PCa) is a major challenge for clinicians. Currently, the clinical risk-stratification for PCa is based on clinico-pathological variables such as Gleason grade, stage and prostate specific antigen (PSA) levels. But transcriptomic data have the potential to enable the development of more precise approaches to predict evolution of the disease. However, high quality RNA sequencing (RNA-seq) datasets along with clinical data with long follow-up allowing discovery of biochemical recurrence (BCR) biomarkers are small and rare. In this study, we propose a machine learning approach that is robust to batch effect and enables the discovery of highly predictive signatures despite using small datasets. Gene expression data were extracted from three RNA-Seq datasets cumulating a total of 171 PCa patients. Data were re-analyzed using a unique pipeline to ensure uniformity. Using a machine learning approach, a total of 14 classifiers were tested with various parameters to identify the best model and gene signature to predict BCR. Using a random forest model, we have identified a signature composed of only three genes (JUN, HES4, PPDPF) predicting BCR with better accuracy [74.2%, balanced error rate (BER) = 27%] than the clinico-pathological variables (69.2%, BER = 32%) currently in use to predict PCa evolution. This score is in the range of the studies that predicted BCR in single-cohort with a higher number of patients. We showed that it is possible to merge and analyze different small and heterogeneous datasets altogether to obtain a better signature than if they were analyzed individually, thus reducing the need for very large cohorts. This study demonstrates the feasibility to regroup different small datasets in one larger to identify a predictive genomic signature that would benefit PCa patients

    Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data

    No full text
    The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML

    Deictic directionals revisited in the light of advances in typology

    No full text
    International audienceThis study explores the issue of Associated Motion (hereafter AM) in five languages spoken in Africa and Asia. We investigate grammatical morphemes whose function is to add a motion process to the event encoded in the verb expressing the main (non-motion) event, and to specify the temporal sequence of these two events (motion-prior-to-action or motion-subsequent-to-action). We show that an AM analysis adequately accounts for the function of morphemes previously considered as directionals in Wolof and Burmese, whereas in Sereer, Northern Mandarin and Japanese, AM markers are concurring with morphemes marking deictic orientation. Our results support recent studies showing that AM is a widespread linguistic phenomenon, and thus raise the question of the place of AM in a typology of motion events

    Immune-focused multi-omics analysis of prostate cancer: leukocyte Ig-Like receptors are associated with disease progression

    No full text
    International audienceProstate cancer (PCa) immunotherapy has shown limited efficacy so far, even in advanced-stage cancers. The success rate of PCa immunotherapy might be improved by approaches more adapted to the immunobiology of the disease. The objective of this study was to perform a multi-omics analysis to identify immune genes associated with PCa progression to better characterize PCa immunobiology and propose new immunotherapeutic targets. mRNA, miRNA, methylation, copy number aberration, and single nucleotide variant datasets from The Cancer Genome Atlas PRAD cohort were analyzed after filtering for genes associated with immunity. Sparse partial least squares-discriminant analyses were performed to identify features associated with biochemical recurrence (BCR) in each type of omics data. Selected features predicted BCR with a balanced error rate (BER) of 0.20 to 0.51 in single-omics and of 0.05 in multi-omics analyses. Amongst features associated with BCR were genes from the Immunoglobulin Ig-like Receptor (LILR) family which are immune checkpoints with immunotherapeutic potential. Using Multivariate INTegrative (MINT) analysis, the association of five LILR genes with BCR was quantified in a combination of three RNA-seq datasets and confirmed with Kaplan-Meier analysis in both these and in an independent RNA-seq dataset. Finally, immunohistochemistry showed that a high number of LILRB1 positive cells within the tumors predicted long-term adverse outcomes. Thus, tumors characterized by abnormal expression of LILR genes have an elevated risk of recurring after definitive local therapy. The immunotherapeutic potential of these regulators to stimulate the immune response against PCa should be evaluated in pre-clinical models

    Novel cytonuclear combinations modify Arabidopsis thaliana seed physiology and vigor

    Get PDF
    Dormancy and germination vigor are complex traits of primary importance for adaptation and agriculture. Intraspecific variation in cytoplasmic genomes and cytonuclear interactions were previously reported to affect germination in Arabidopsis using novel cytonuclear combinations that disrupt co-adaptation between natural variants of nuclear and cytoplasmic genomes. However, specific aspects of dormancy and germination vigor were not thoroughly explored, nor the parental contributions to the genetic effects. Here, we specifically assessed dormancy, germination performance and longevity of seeds from Arabidopsis plants with natural and new genomic compositions. All three traits were modified by cytonuclear reshuffling. Both depth and release rate of dormancy could be modified by a changing of cytoplasm. Significant changes on dormancy and germination performance due to specific cytonuclear interacting combinations mainly occurred in opposite directions, consistent with the idea that a single physiological consequence of the new genetic combination affected both traits oppositely. However, this was not always the case. Interestingly, the ability of parental accessions to contribute to significant cytonuclear interactions modifying the germination phenotype was different depending on whether they provided the nuclear or cytoplasmic genetic compartment. The observed deleterious effects of novel cytonuclear combinations (in comparison with the nuclear parent) were consistent with a contribution of cytonuclear interactions to germination adaptive phenotypes. More surprisingly, we also observed favorable effects of novel cytonuclear combinations, suggesting suboptimal genetic combinations exist in natural populations for these traits. Reduced sensitivity to exogenous ABA and faster endogenous ABA decay during germination were observed in a novel cytonuclear combination that also exhibited enhanced longevity and better germination performance, compared to its natural nuclear parent. Taken together, our results strongly support that cytoplasmic genomes represent an additional resource of natural variation for breeding seed vigor traits
    corecore