15 research outputs found

    Identification de biomarqueurs et d’ARN non codants par des approches basées sur l’intelligence computationnelle

    Get PDF
    Currently, cancer prevails as a prime health matter worldwide. Cancer clas- sification hastraditionally been based on the morphological study of tumors. However, tumors with similarhistological appearances can exhibit different responses to therapy, indicating differences intumor characteristics on the molecular level. Thus, the development of a novel, reliable andaccurate method for the classification of tumors is essential for more successful diag- nosis andtreatment. Molecular biomarkers allow new ways of understanding disease processes and themanner in which medicines work to counteract dis- ease. In the last few years, researchers havededicated growing attention to biomarker identification given due to its extreme importance ingenomics and personalized medicine.In this thesis, we address the problem of biomarker discovery at two lev- els: genomics andtranscriptomics. We are first interested in the problem of selecting robust and accurate signaturesfrom gene expression data which re- lies heavily on the used feature selection algorithms. Themain objective is to attempt high performance of computer-aided diagnosis (CAD), by selectingfew genes with high predictive power and high sensibility to variations in real clinical tests. Forthat purpose, we have investigated ensemble-based methods and parallel cooperativemetaheuristics which have received an increasing attention due to their power to give higheraccuracy and stability than a sin- gle algorithm can achieve. Accordingly, we propose a parallelensemble-based feature selection method based on meta-ensemble of filters (MPME-FS) forbiomarker discovery from gene expression data. Then, we propose a hybrid wrapper/filter featureselection method based on the parallel cooperation of metaheuristics and a filter-basedmechanism for both the initialization and the reparation of solutions, called CPM-FS. After that,we propose an ensemble-based wrapper gene selection method based on the previously proposedCPM-FS and a wrapper based consensus function in order to take into account genesdependencies. Experiments on 12 publicly available cancer datasets have shown that ourapproaches outperform recent state-of-the-art methods in term of the predictive performance.They also provide robust selection through the different similarity measures. Biologicalinterpretation of the selected signature reveals that the proposed methods guarantee the selectionof highly informative genes for cancer diagnosis.In a second part of this thesis, we propose an integrative approach for the prediction of noncodingRNAs, which are molecules with an important role in post-transcriptional gene regulationhighlighting their importance as putative markers and their impact on the development and theprogression of many diseases. In the proposed approach several types of genomic andepigenomic properties that can be used to characterize these molecules are examined. We havedeveloped a generic tool called IncRId that allows tak- ing into account all reviewedheterogeneous features in a modular and easily extensible way and could be used and adapted forpredicting any type of ncRNA. Our method makes it possible to study the validity of each givenfeature in each of the candidate species. Then, we present an application example by focusing onthe prediction of piRNAs. We reviewed and ex- tracted a large number of piRNA features fromthe literature that have been observed experimentally in several species. We implemented thesefeatures in a tool, called IpiRId, to study the pertinence of each feature in each of the studiedspecies: human, mouse and fly. IpiRId prediction results attain more than 90% accuracy,outperforming all existing tools. The IpiRId soft- ware and the web server of our tool are freelyavailable to academic users at: https://EvryRNA.ibisc.univ-evry.frCurrently, cancer prevails as a prime health matter worldwide. Cancer clas- sification hastraditionally been based on the morphological study of tumors. However, tumors with similarhistological appearances can exhibit different responses to therapy, indicating differences intumor characteristics on the molecular level. Thus, the development of a novel, reliable andaccurate method for the classification of tumors is essential for more successful diag- nosis andtreatment. Molecular biomarkers allow new ways of understanding disease processes and themanner in which medicines work to counteract dis- ease. In the last few years, researchers havededicated growing attention to biomarker identification given due to its extreme importance ingenomics and personalized medicine.In this thesis, we address the problem of biomarker discovery at two lev- els: genomics andtranscriptomics. We are first interested in the problem of selecting robust and accurate signaturesfrom gene expression data which re- lies heavily on the used feature selection algorithms. Themain objective is to attempt high performance of computer-aided diagnosis (CAD), by selectingfew genes with high predictive power and high sensibility to variations in real clinical tests. Forthat purpose, we have investigated ensemble-based methods and parallel cooperativemetaheuristics which have received an increasing attention due to their power to give higheraccuracy and stability than a sin- gle algorithm can achieve. Accordingly, we propose a parallelensemble-based feature selection method based on meta-ensemble of filters (MPME-FS) forbiomarker discovery from gene expression data. Then, we propose a hybrid wrapper/filter featureselection method based on the parallel cooperation of metaheuristics and a filter-basedmechanism for both the initialization and the reparation of solutions, called CPM-FS. After that,we propose an ensemble-based wrapper gene selection method based on the previously proposedCPM-FS and a wrapper based consensus function in order to take into account genesdependencies. Experiments on 12 publicly available cancer datasets have shown that ourapproaches outperform recent state-of-the-art methods in term of the predictive performance.They also provide robust selection through the different similarity measures. Biologicalinterpretation of the selected signature reveals that the proposed methods guarantee the selectionof highly informative genes for cancer diagnosis.In a second part of this thesis, we propose an integrative approach for the prediction of noncodingRNAs, which are molecules with an important role in post-transcriptional gene regulationhighlighting their importance as putative markers and their impact on the development and theprogression of many diseases. In the proposed approach several types of genomic andepigenomic properties that can be used to characterize these molecules are examined. We havedeveloped a generic tool called IncRId that allows tak- ing into account all reviewedheterogeneous features in a modular and easily extensible way and could be used and adapted forpredicting any type of ncRNA. Our method makes it possible to study the validity of each givenfeature in each of the candidate species. Then, we present an application example by focusing onthe prediction of piRNAs. We reviewed and ex- tracted a large number of piRNA features fromthe literature that have been observed experimentally in several species. We implemented thesefeatures in a tool, called IpiRId, to study the pertinence of each feature in each of the studiedspecies: human, mouse and fly. IpiRId prediction results attain more than 90% accuracy,outperforming all existing tools. The IpiRId soft- ware and the web server of our tool are freelyavailable to academic users at: https://EvryRNA.ibisc.univ-evry.frActuellement, le cancer prédomine comme le premier problème de santé dans le monde. Laclassification des cancers a toujours été fondée sur l’étude morphologique des tumeurs.Cependant, les tumeurs avec des apparences histologiques similaires peuvent présenter desréponses différentes au traite- ment, ce qui indique des différences de caractéristiques de latumeur au niveau moléculaire. Ainsi, le développement d’une nouvelle méthode fiable et précisepour la classification des tumeurs est essentiel pour un diagnostic et un traite- ment plus efficace.Les biomarqueurs moléculaires fournissent de nouvelles façons permettant de comprendre leprocessus de la maladie et les moyens par lesquels les médicaments fonctionnent pour luttercontre la maladie. Au cours des dernières années, les chercheurs ont consacré un intérêt croissantà l’identification de biomarqueurs, en raison de son extrême importance en génomique et dans lamédecine personnalisée.Dans cette thèse, nous abordons le problème de la découverte de biomarqueurs à deux niveaux:génomique et transcriptomique. Nous nous intéressons d’abord au problème de la sélection dessignatures moléculaires robustes et précises à partir des données d’expression génique quis’appuie principale- ment sur les algorithmes de sélection de caractéristiques. L’objectif principalest d’atteindre de hautes performances de diagnostic assisté par ordinateur, en sélectionnantquelques gènes avec une forte puissance prédictive et une grande sensibilité aux variations dansles tests cliniques réels. À cette fin, nous étudions les méthodes basées ensemble et la coopérationparallèle de métaheuristiques qui ont reçues une attention croissante en raison de leur pouvoir dedonner une plus grande précision et stabilité qu’un algorithme unique peut atteindre. Dans cettedirection, nous proposons une méthode parallèle de sélection de caractéristiques basée sur unméta-ensemble de filtres (MPME-FS) pour la découverte de biomarqueurs à partir des donnéesd’expression génique. La deuxième méthode proposée pour la découverte de biomarqueurs estune méthode hybride wrapper / filtre de sélection de caractéristiques basée sur la coopérationparallèle de métaheuristiques et un mécanisme à base de filtres pour l’initialisation et laréparation des solutions, appelé CPM-FS. Nous avons également proposé une méthode desélection de gènes en deux étapes dont chacune est basée wrapper en utilisant la méthodeprécédemment proposée (CPM-FS) et une fonction de consensus qui prend en compte lesdépendances entre gènes. Les expérimentations sur douze ensembles de données représentantdifférents types du cancer ont montré que nos approches surpassent les méthodes récentes dans lalittérature en terme de performance prédictive et fournissent également une sélection robuste àtravers les différentes mesures de similarité. L’interprétation biologique des signaturessélectionnées indique que les méthodes proposées garantissent la sélection des gènes hautementinformatifs pour le diagnostic du cancer.Dans une deuxième partie de cette thèse, nous proposons une approche intégrative pour laprédiction des ARN non-codants (ARNnc) qui jouent un rôle important dans la régulation posttranscriptionnellede gènes, soulignant leur importance en tant que biomarqueurs et leur impactsur le développe- ment et la progression de nombreuses maladies. Dans l’approche proposéeplusieurs types de propriétés génomiques et épigénomiques qui peuvent être utilisées pourcaractériser ces molécules sont examinées. Nous développons un outil générique appelé IncRIdqui permet de prendre en compte toutes les caractéristiques hétérogènes examinées de façonmodulaire et facilement extensible et peut être utilisé et adapté pour prédire tout type d’ARNnc.Notre méthode permet également d’étudier la validité de chaque caractéristique dans chacunedes espèces candidates. Par la suite, nous présentons un exemple d’application en se concentrantsur la prédiction d’ARNpi. Nous avons examiné et extrait un grand nombre de caractéristiquesd’ARNpi de la littérature qui ont été observées expérimentalement chez plusieurs espèces. Nousavons implémenté ces caractéristiques dans un outil, appelé IpiRId, afin d’étudier la pertinence dechaque caractéristique dans chacune des espèces étudiées: humain, souris et mouche. Lesrésultats de prédiction d’IpiRId atteignent plus de 90% de précision, surpassant tous les outilsexistants. Le logiciel IpiRId et le serveur web de notre outil sont disponibles gratuite- ment pourles utilisateurs académiques à l’adresse : https://EvryRNA.ibisc.univ-evry.fr

    Identification de biomarqueurs et d’ARN non codants par des approches basées sur l’intelligence computationnelle

    No full text
    Currently, cancer prevails as a prime health matter worldwide. Cancer clas- sification hastraditionally been based on the morphological study of tumors. However, tumors with similarhistological appearances can exhibit different responses to therapy, indicating differences intumor characteristics on the molecular level. Thus, the development of a novel, reliable andaccurate method for the classification of tumors is essential for more successful diag- nosis andtreatment. Molecular biomarkers allow new ways of understanding disease processes and themanner in which medicines work to counteract dis- ease. In the last few years, researchers havededicated growing attention to biomarker identification given due to its extreme importance ingenomics and personalized medicine.In this thesis, we address the problem of biomarker discovery at two lev- els: genomics andtranscriptomics. We are first interested in the problem of selecting robust and accurate signaturesfrom gene expression data which re- lies heavily on the used feature selection algorithms. Themain objective is to attempt high performance of computer-aided diagnosis (CAD), by selectingfew genes with high predictive power and high sensibility to variations in real clinical tests. Forthat purpose, we have investigated ensemble-based methods and parallel cooperativemetaheuristics which have received an increasing attention due to their power to give higheraccuracy and stability than a sin- gle algorithm can achieve. Accordingly, we propose a parallelensemble-based feature selection method based on meta-ensemble of filters (MPME-FS) forbiomarker discovery from gene expression data. Then, we propose a hybrid wrapper/filter featureselection method based on the parallel cooperation of metaheuristics and a filter-basedmechanism for both the initialization and the reparation of solutions, called CPM-FS. After that,we propose an ensemble-based wrapper gene selection method based on the previously proposedCPM-FS and a wrapper based consensus function in order to take into account genesdependencies. Experiments on 12 publicly available cancer datasets have shown that ourapproaches outperform recent state-of-the-art methods in term of the predictive performance.They also provide robust selection through the different similarity measures. Biologicalinterpretation of the selected signature reveals that the proposed methods guarantee the selectionof highly informative genes for cancer diagnosis.In a second part of this thesis, we propose an integrative approach for the prediction of noncodingRNAs, which are molecules with an important role in post-transcriptional gene regulationhighlighting their importance as putative markers and their impact on the development and theprogression of many diseases. In the proposed approach several types of genomic andepigenomic properties that can be used to characterize these molecules are examined. We havedeveloped a generic tool called IncRId that allows tak- ing into account all reviewedheterogeneous features in a modular and easily extensible way and could be used and adapted forpredicting any type of ncRNA. Our method makes it possible to study the validity of each givenfeature in each of the candidate species. Then, we present an application example by focusing onthe prediction of piRNAs. We reviewed and ex- tracted a large number of piRNA features fromthe literature that have been observed experimentally in several species. We implemented thesefeatures in a tool, called IpiRId, to study the pertinence of each feature in each of the studiedspecies: human, mouse and fly. IpiRId prediction results attain more than 90% accuracy,outperforming all existing tools. The IpiRId soft- ware and the web server of our tool are freelyavailable to academic users at: https://EvryRNA.ibisc.univ-evry.frCurrently, cancer prevails as a prime health matter worldwide. Cancer clas- sification hastraditionally been based on the morphological study of tumors. However, tumors with similarhistological appearances can exhibit different responses to therapy, indicating differences intumor characteristics on the molecular level. Thus, the development of a novel, reliable andaccurate method for the classification of tumors is essential for more successful diag- nosis andtreatment. Molecular biomarkers allow new ways of understanding disease processes and themanner in which medicines work to counteract dis- ease. In the last few years, researchers havededicated growing attention to biomarker identification given due to its extreme importance ingenomics and personalized medicine.In this thesis, we address the problem of biomarker discovery at two lev- els: genomics andtranscriptomics. We are first interested in the problem of selecting robust and accurate signaturesfrom gene expression data which re- lies heavily on the used feature selection algorithms. Themain objective is to attempt high performance of computer-aided diagnosis (CAD), by selectingfew genes with high predictive power and high sensibility to variations in real clinical tests. Forthat purpose, we have investigated ensemble-based methods and parallel cooperativemetaheuristics which have received an increasing attention due to their power to give higheraccuracy and stability than a sin- gle algorithm can achieve. Accordingly, we propose a parallelensemble-based feature selection method based on meta-ensemble of filters (MPME-FS) forbiomarker discovery from gene expression data. Then, we propose a hybrid wrapper/filter featureselection method based on the parallel cooperation of metaheuristics and a filter-basedmechanism for both the initialization and the reparation of solutions, called CPM-FS. After that,we propose an ensemble-based wrapper gene selection method based on the previously proposedCPM-FS and a wrapper based consensus function in order to take into account genesdependencies. Experiments on 12 publicly available cancer datasets have shown that ourapproaches outperform recent state-of-the-art methods in term of the predictive performance.They also provide robust selection through the different similarity measures. Biologicalinterpretation of the selected signature reveals that the proposed methods guarantee the selectionof highly informative genes for cancer diagnosis.In a second part of this thesis, we propose an integrative approach for the prediction of noncodingRNAs, which are molecules with an important role in post-transcriptional gene regulationhighlighting their importance as putative markers and their impact on the development and theprogression of many diseases. In the proposed approach several types of genomic andepigenomic properties that can be used to characterize these molecules are examined. We havedeveloped a generic tool called IncRId that allows tak- ing into account all reviewedheterogeneous features in a modular and easily extensible way and could be used and adapted forpredicting any type of ncRNA. Our method makes it possible to study the validity of each givenfeature in each of the candidate species. Then, we present an application example by focusing onthe prediction of piRNAs. We reviewed and ex- tracted a large number of piRNA features fromthe literature that have been observed experimentally in several species. We implemented thesefeatures in a tool, called IpiRId, to study the pertinence of each feature in each of the studiedspecies: human, mouse and fly. IpiRId prediction results attain more than 90% accuracy,outperforming all existing tools. The IpiRId soft- ware and the web server of our tool are freelyavailable to academic users at: https://EvryRNA.ibisc.univ-evry.frActuellement, le cancer prédomine comme le premier problème de santé dans le monde. Laclassification des cancers a toujours été fondée sur l’étude morphologique des tumeurs.Cependant, les tumeurs avec des apparences histologiques similaires peuvent présenter desréponses différentes au traite- ment, ce qui indique des différences de caractéristiques de latumeur au niveau moléculaire. Ainsi, le développement d’une nouvelle méthode fiable et précisepour la classification des tumeurs est essentiel pour un diagnostic et un traite- ment plus efficace.Les biomarqueurs moléculaires fournissent de nouvelles façons permettant de comprendre leprocessus de la maladie et les moyens par lesquels les médicaments fonctionnent pour luttercontre la maladie. Au cours des dernières années, les chercheurs ont consacré un intérêt croissantà l’identification de biomarqueurs, en raison de son extrême importance en génomique et dans lamédecine personnalisée.Dans cette thèse, nous abordons le problème de la découverte de biomarqueurs à deux niveaux:génomique et transcriptomique. Nous nous intéressons d’abord au problème de la sélection dessignatures moléculaires robustes et précises à partir des données d’expression génique quis’appuie principale- ment sur les algorithmes de sélection de caractéristiques. L’objectif principalest d’atteindre de hautes performances de diagnostic assisté par ordinateur, en sélectionnantquelques gènes avec une forte puissance prédictive et une grande sensibilité aux variations dansles tests cliniques réels. À cette fin, nous étudions les méthodes basées ensemble et la coopérationparallèle de métaheuristiques qui ont reçues une attention croissante en raison de leur pouvoir dedonner une plus grande précision et stabilité qu’un algorithme unique peut atteindre. Dans cettedirection, nous proposons une méthode parallèle de sélection de caractéristiques basée sur unméta-ensemble de filtres (MPME-FS) pour la découverte de biomarqueurs à partir des donnéesd’expression génique. La deuxième méthode proposée pour la découverte de biomarqueurs estune méthode hybride wrapper / filtre de sélection de caractéristiques basée sur la coopérationparallèle de métaheuristiques et un mécanisme à base de filtres pour l’initialisation et laréparation des solutions, appelé CPM-FS. Nous avons également proposé une méthode desélection de gènes en deux étapes dont chacune est basée wrapper en utilisant la méthodeprécédemment proposée (CPM-FS) et une fonction de consensus qui prend en compte lesdépendances entre gènes. Les expérimentations sur douze ensembles de données représentantdifférents types du cancer ont montré que nos approches surpassent les méthodes récentes dans lalittérature en terme de performance prédictive et fournissent également une sélection robuste àtravers les différentes mesures de similarité. L’interprétation biologique des signaturessélectionnées indique que les méthodes proposées garantissent la sélection des gènes hautementinformatifs pour le diagnostic du cancer.Dans une deuxième partie de cette thèse, nous proposons une approche intégrative pour laprédiction des ARN non-codants (ARNnc) qui jouent un rôle important dans la régulation posttranscriptionnellede gènes, soulignant leur importance en tant que biomarqueurs et leur impactsur le développe- ment et la progression de nombreuses maladies. Dans l’approche proposéeplusieurs types de propriétés génomiques et épigénomiques qui peuvent être utilisées pourcaractériser ces molécules sont examinées. Nous développons un outil générique appelé IncRIdqui permet de prendre en compte toutes les caractéristiques hétérogènes examinées de façonmodulaire et facilement extensible et peut être utilisé et adapté pour prédire tout type d’ARNnc.Notre méthode permet également d’étudier la validité de chaque caractéristique dans chacunedes espèces candidates. Par la suite, nous présentons un exemple d’application en se concentrantsur la prédiction d’ARNpi. Nous avons examiné et extrait un grand nombre de caractéristiquesd’ARNpi de la littérature qui ont été observées expérimentalement chez plusieurs espèces. Nousavons implémenté ces caractéristiques dans un outil, appelé IpiRId, afin d’étudier la pertinence dechaque caractéristique dans chacune des espèces étudiées: humain, souris et mouche. Lesrésultats de prédiction d’IpiRId atteignent plus de 90% de précision, surpassant tous les outilsexistants. Le logiciel IpiRId et le serveur web de notre outil sont disponibles gratuite- ment pourles utilisateurs académiques à l’adresse : https://EvryRNA.ibisc.univ-evry.fr

    In silico prediction of RNA secondary structure

    No full text
    International audienceThe secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases

    A Novel Integrative Approach for Non-coding RNA Classification Based on Deep Learning

    No full text
    International audienceBackground: Molecular biomarkers show new ways to understand many disease processes. Noncoding RNAs as biomarkers play a crucial role in several cellular activities, which are highly correlated to many human diseases especially cancer. The classification and the identification of ncRNAs have become a critical issue due to their application, such as biomarkers in many human diseases. Objective: Most existing computational tools for ncRNA classification are mainly used for classifying only one type of ncRNA. They are based on structural information or specific known features. Furthermore, these tools suffer from a lack of significant and validated features. Therefore, the performance of these methods is not always satisfactory. Methods: We propose a novel approach named imCnC for ncRNA classification based on multisource deep learning, which integrates several data sources such as genomic and epigenomic data to identify several ncRNA types. Also, we propose an optimization technique to visualize the extracted features pattern from the multisource CNN model to measure the epigenomics features of each ncRNA type. Results: the computational results using a dataset of 16 human ncRNA classes downloaded from RFAM show that imCnC outperforms the existing tools. Indeed, imCnC achieved an accuracy of 94,18%. In addition, our method enables to discover new ncRNA features using an optimization technique to measure and visualize the features pattern of the imCnC classifier

    IpiRId: Integrative approach for piRNA prediction using genomic and epigenomic data

    No full text
    Many computational tools have been proposed during the two last decades for predicting piRNAs, which are molecules with important role in post-transcriptional gene regulation. However, these tools are mostly based on only one feature that is generally related to the sequence. Discoveries in the domain of piRNAs are still in their beginning stages, and recent publications have shown many new properties. Here, we propose an integrative approach for piRNA prediction in which several types of genomic and epigenomic properties that can be used to characterize these molecules are examined. We reviewed and extracted a large number of piRNA features from the literature that have been observed experimentally in several species. These features are represented by different kernels, in a Multiple Kernel Learning based approach, implemented within an object-oriented framework. The obtained tool, called IpiRId, shows prediction results that attain more than 90% of accuracy on different tested species (human, mouse and fly), outperforming all existing tools. Besides, our method makes it possible to study the validity of each given feature in a given species. Finally, the developed tool is modular and easily extensible, and can be adapted for predicting other types of ncRNAs. The IpiRId software and the user-friendly web-based server of our tool are now freely available to academic users at: https://evryrna.ibisc.univ-evry.fr/evryrna/

    Performance comparison.

    No full text
    <p>5-fold cross-validation results of IpiRId and other existing tools according to: Accuracy (Acc), Sensitivity (Se), Specificity (Sp), Precision (Pre) and F1 score (F1).</p
    corecore