18 research outputs found

    Sur l'utilisation active de la diversité dans la construction d'ensembles de classifieurs. Application à la détection de fumées nocives sur site industriel

    Full text link
    Discussions about the influence of diversity when designing Multiple Classifier Systems has been an active topic in Machine Learning overs recent years. One possible way of considering the design of Multiple Classifier Systems is to select the ensemble members from a large pool of classifiers focusing on predefined criteria, which is known as the Overproduce and Choose paradigm, also called Ensemble Pruning. The objective of this PhD Thesis is to study the trade-off between accuracy and diversity which exists in multiple classifier systems and bring some elements of response on the elusive behavior of diversity when using it explicitly in ensemble learning algorithms. We start by reviewing some well known Machine Learning algorithms and ensemble learning techniques from the literature. We then present in details the concept of diversity and the way it is used by certain ensemble learning algorithms. We propose a genetic heuristic to design multiple classifier systems by controlling the trade-off between diversity and accuracy when selecting individual classifiers. We compare the proposed genetic selection with several heuristics described in the literature to build multiple classifier systems under the Overproduce and Choose paradigm. The different observations we draw from several experiments on UCI datasets lead us to propose certain specific conditions where it might be worth using diversity explicitly during the design stage of multiple classifier systems. We also show that effectiveness of the Overproduce and Choose paradigm mainly relies on the stability of a given problem. The application of our research work concerns the development of a supervised classification system to control atmospheric pollution around industrial complexes. This system is based on the analysis of visual scenes recorded by cameras and aims at detecting dangerous smoke trails rejected by steelworks or chemical factories.L'influence de la diversité lors de la construction d'ensembles de classifieurs a soulevé de nombreuses discussions au sein de la communauté de l'Apprentissage Automatique ces dernières années. Une manière particulière de construire un ensemble de classifieurs consiste à sélectionner individuellement les membres de l'ensemble à partir d'un pool de classifieurs en se basant sur des critères prédéfinis. La littérature fait référence à cette méthode sous le terme de paradigme Surproduction et Sélection, également appelé élagage d'ensemble de classifieurs. Les travaux présentés dans cette thèse ont pour objectif d'étudier le compromis entre la précision et la diversité existant dans les ensembles de classifieurs. Nous apportons également certains éléments de réponse sur le comportement insaisissable de la diversité lorsqu'elle est utilisée de manière explicite lors de la construction d'un ensemble de classifieurs. Nous commençons par étudier différents algorithmes d'apprentissage de la littérature. Nous présentons également les algorithmes ensemblistes les plus fréquemment utilisés. Nous définissons ensuite le concept de diversité dans les ensembles de classifieurs ainsi que les différentes méthodes permettant de l'utiliser directement lors de la création de l'ensemble. Nous proposons un algorithme génétique permettant de construire un ensemble de classifieurs en contrôlant le compromis entre précision et diversité lors de la sélection des membres de l'ensemble. Nous comparons notre algorithme avec différentes heuristiques de sélection proposées dans la littérature pour construire un ensemble de classifieurs selon le paradigme Surproduction et Sélection. Les différentes conclusions que nous tirons des résultats obtenus pour différents jeux de données de l'UCI Repository nous conduisent à la proposition de conditions spécifiques pour lesquelles l'utilisation de la diversité peut amener à une amélioration des performances de l'ensemble de classifieurs. Nous montrons également que l'efficacité de l'approche Surproduction et Sélection repose en grande partie sur la stabilité inhérente au problème posé. Nous appliquons finalement nos travaux de recherche au développement d'un système de classification supervisée pour le contrôle de la pollution atmosphérique survenant sur des sites industriels. Ce système est basé sur l'analyse par traitement d'image de scènes à risque enregistrées à l'aide de caméras. Son principal objectif principal est de détecter les rejets de fumées dangereux émis par des usines sidérurgiques et pétro-chimiques

    Sur l'utilisation active de la diversité dans la construction d'ensembles classifieurs (application à la détection de fumées nocives sur site industriel)

    Full text link
    L'influence de la diversité lors de la construction d'ensembles de classifieurs soulève de nombreuses discussions en Apprentissage Automatique. L une des méthodes pour construire un ensemble de classifieurs, suit le paradigme Surproduction et Sélection et consiste à sélectionner certains membres d un ensemble de classifieurs selon des critères prédéfinis. Les travaux présentés dans cette thèse ont pour objectif d'étudier le compromis entre la précision et la diversité qui caraCtérisent les ensembles de classifieurs. Nous présentons différents algorithmes d'apprentissage ainsi que les méthodes ensemblistes les plus fréquemment citées dans la littérature. Nous définissons ensuite le concept de diversité dans les ensembles de classifieurs ainsi que les différentes méthodes permettant de l'exploiter lors de la création de l'ensemble. Nous proposons un algorithme génétique que nous comparons avec différentes heuristiques de sélection de classifieurs proposées dans la littérature pour construire un ensemble selon le paradigme Surproduction et Sélection. Nous appliquons nos travaux de recherche au développement d'un système de classification supervisée pour le contrôle de la pollution atmosphérique sur des sites industriels. Ce système est basé sur l'analyse par traitement d'image concernant des situations à risque enregistrées à l'aide de caméras. Son principal objectif est de détecter des fumées nocives émises par des usines sidérurgiques ou pétrochimiques.Discussions about the influence of diversity when designing Multiple Classifier Systems is an active topic in Machine Learning. One possible way of considering the design of Multiple Classifier Systems is to select the ensemble members from a large pool of classifiers focusing on predefined criteria, which is known as the Overproduce and Choose paradigm. The objective of this PhD Thesis is to study the trade-off between accuracy and diversity which exists in multiple classifier systems. We review some well known Machine Learning algorithms and ensemble learning techniques from the literature and we present in details the concept of diversity and the way it is used by certain ensemble learning algorithms. We propose a genetic heuristic to design multiple classifier systems by controlling the trade-off between diversity and accuracy when selecting individual classifiers. We compare the proposed genetic selection with several heuristics described in the literature to build multiple classifier systems under the Overproduce and Choose paradigm. The application of our research work concerns the development of a supervised classification system to control atmospheric pollution around industrial complexes. This system is based on the analysis of visual scenes recorded by cameras and aims at detecting dangerous smoke trails rejected by steelworks or chemical factories.VALENCIENNES-BU Sciences Lettres (596062101) / SudocSudocFranceF

    Genomic hotspots but few recurrent fusion genes in breast cancer.

    Full text link
    The advent of next generation sequencing technologies has boosted the interest in exploring the role of fusion genes in the development and progression of solid tumors. In breast cancer, most of the detected gene fusions seem to be "passenger" events while the presence of recurrent and driver fusions is still under study. We performed RNA sequencing in 55 well-characterized breast cancer samples and 10 adjacent normal breast tissues, complemented by an analysis of SNP array data. We explored the presence of fusion genes and defined their association with breast cancer subtypes, clinical-pathologic characteristics and copy number aberrations. Overall, 370 fusions were detected across the majority of the samples. HER2+ samples had significantly more fusions than triple negative and luminal subtypes. The number of fusions was correlated with histological grade, Ki67 and tumor size. Clusters of fusion genes were observed across the genome and a significant correlation of fusions with copy number aberrations and more specifically amplifications was also revealed. Despite the large number of fusion events, only a few were recurrent, while recurrent individual genes forming fusions with different partners were also detected including the estrogen receptor 1 gene in the previously detected ESR1-CCDC170 fusion. Overall we detected novel gene fusion events while we confirmed previously reported fusions. Genomic hotspots of fusion genes, differences between subtypes and small number of recurrent fusions are the most relevant characteristics of these events in breast cancer. Further investigation is necessary to comprehend the biological significance of these fusions.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    No significant viral transcription detected in whole breast cancer transcriptomes

    Full text link
    Background: Studies evaluating the presence of viral sequences in breast cancer (BC), including various strains of human papillomavirus and human herpes virus, have yielded conflicting results. Most were based on RT-PCR and in situ hybridization. Methods: In this report we searched for expressed viral sequences in 58 BC transcriptomes using five distinct in silico methods. In addition, we complemented our RNA sequencing results with exome sequencing, PCR and immunohistochemistry (IHC) analyses. A control sample was used to test our in silico methods. Results: All of the computational methods correctly detected viral sequences in the control sample. We identified a small number of viral sequences belonging to human herpesvirus 4 and 6 and Merkel cell polyomavirus. The extremely low expression levels-two orders of magnitude lower than in a typical hepatitis B virus infection in hepatocellular carcinoma-did not suggest active infections. The presence of viral elements was confirmed in sample-matched exome sequences, but could not be confirmed by PCR or IHC. Conclusions: Our results show that no viral sequences are expressed in significant amounts in the BC investigated. The presence of non-transcribed viral DNA cannot be excluded.SCOPUS: ar.jHydrainfo:eu-repo/semantics/publishe

    New global analysis of the microRNA transcriptome of primary tumors and lymph node metastases of papillary thyroid cancer

    Full text link
    Background: Papillary Thyroid Cancer (PTC) is the most prevalent type of endocrine cancer. Its incidence has rapidly increased in recent decades but little is known regarding its complete microRNA transcriptome (miRNome). In addition, there is a need for molecular biomarkers allowing improved PTC diagnosis. Methods: We performed small RNA deep-sequencing of 3 PTC, their matching normal tissues and lymph node metastases (LNM). We designed a new bioinformatics framework to handle each aspect of the miRNome: whole expression profiles, isomiRs distribution, non-templated additions distributions, RNA-editing or mutation. Results were validated experimentally by qRT-PCR on normal samples, tumors and LNM from 14 independent patients and in silico using the dataset from The Cancer Genome Atlas (small RNA deepsequencing of 59 normal samples, 495 PTC, and 8 LNM). Results: We performed small RNA deep-sequencing of 3 PTC, their matching normal tissues and lymph node metastases (LNM). We designed a new bioinformatics framework to handle each aspect of the miRNome: whole expression profiles, isomiRs distribution, non-templated additions distributions, RNA-editing or mutation. Results were validated experimentally by qRT-PCR on normal samples, tumors and LNM from 14 independent patients and in silico using the dataset from The Cancer Genome Atlas (small RNA deep-sequencing of 59 normal samples, 495 PTC, and 8 LNM). We confirmed already described up-regulations of microRNAs in PTC, such as miR-146b-5p or miR-222-3p, but we also identified down-regulated microRNAs, such as miR-7-5p or miR-30c-2-3p. We showed that these down-regulations are linked to the tumorigenesis process of thyrocytes. We selected the 14 most down-regulated microRNAs in PTC and we showed that they are potential biomarkers of PTC samples. Nevertheless, they can distinguish histological classical variants and follicular variants of PTC in the TCGA dataset. In addition, 12 of the 14 down-regulated microRNAs are significantly less expressed in aggressive PTC compared to non-aggressive PTC. We showed that the associated aggressive expression profile is mainly due to the presence of the BRAF V600E mutation. In general, primary tumors and LNM presented similar microRNA expression profiles but specific variations like the down-regulation of miR-7-2-3p and miR-30c-2-3p in LNM were observed. Investigations of the 5p-to-3p arm expression ratios, non-templated additions or isomiRs distributions revealed no major implication in PTC tumorigenesis process or LNM appearance. Conclusions: Our results showed that down-regulated microRNAs can be used as new potential common biomarkers of PTC and to distinguish main subtypes of PTC. MicroRNA expressions can be linked to the development of LNM of PTC. The bioinformatics framework that we have developed can be used as a starting point for the global analysis of any microRNA deep-sequencing data in an unbiased way.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Human-specific NOTCH2NL genes expand cortical neurogenesis through Delta/Notch regulation

    Full text link
    The cerebral cortex underwent rapid expansion and increased complexity during recent hominid evolution. Gene duplications constitute a major evolutionary force, but their impact on human brain development remains unclear. Using tailored RNA sequencing (RNA-seq), we profiled the spatial and temporal expression of hominid-specific duplicated (HS) genes in the human fetal cortex and identified a repertoire of 35 HS genes displaying robust and dynamic patterns during cortical neurogenesis. Among them NOTCH2NL, human-specific paralogs of the NOTCH2 receptor, stood out for their ability to promote cortical progenitor maintenance. NOTCH2NL promote the clonal expansion of human cortical progenitors, ultimately leading to higher neuronal output. At the molecular level, NOTCH2NL function by activating the Notch pathway through inhibition of cis Delta/Notch interactions. Our study uncovers a large repertoire of recently evolved genes active during human corticogenesis and reveals how human-specific NOTCH paralogs may have contributed to the expansion of the human cortex. Human-specific NOTCH2NL expands cortical progenitors and neuronal output and thus may have contributed to the expansion of the human cortex.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Transfer of clinically relevant gene expression signatures in breast cancer: from Affymetrix microarray to Illumina RNA-Sequencing technology

    Full text link
    Abstract Background Microarrays have revolutionized breast cancer (BC) research by enabling studies of gene expression on a transcriptome-wide scale. Recently, RNA-Sequencing (RNA-Seq) has emerged as an alternative for precise readouts of the transcriptome. To date, no study has compared the ability of the two technologies to quantify clinically relevant individual genes and microarray-derived gene expression signatures (GES) in a set of BC samples encompassing the known molecular BC’s subtypes. To accomplish this, the RNA from 57 BCs representing the four main molecular subtypes (triple negative, HER2 positive, luminal A, luminal B), was profiled with Affymetrix HG-U133 Plus 2.0 chips and sequenced using the Illumina HiSeq 2000 platform. The correlations of three clinically relevant BC genes, six molecular subtype classifiers, and a selection of 21 GES were evaluated. Results 16,097 genes common to the two platforms were retained for downstream analysis. Gene-wise comparison of microarray and RNA-Seq data revealed that 52% had a Spearman’s correlation coefficient greater than 0.7 with highly correlated genes displaying significantly higher expression levels. We found excellent correlation between microarray and RNA-Seq for the estrogen receptor (ER; rs = 0.973; 95% CI: 0.971-0.975), progesterone receptor (PgR; rs = 0.95; 0.947-0.954), and human epidermal growth factor receptor 2 (HER2; rs = 0.918; 0.912-0.923), while a few discordances between ER and PgR quantified by immunohistochemistry and RNA-Seq/microarray were observed. All the subtype classifiers evaluated agreed well (Cohen’s kappa coefficients >0.8) and all the proliferation-based GES showed excellent Spearman correlations between microarray and RNA-Seq (all rs >0.965). Immune-, stroma- and pathway-based GES showed a lower correlation relative to prognostic signatures (all rs >0.6). Conclusions To our knowledge, this is the first study to report a systematic comparison of RNA-Seq to microarray for the evaluation of single genes and GES clinically relevant to BC. According to our results, the vast majority of single gene biomarkers and well-established GES can be reliably evaluated using the RNA-Seq technology

    Intratumor heterogeneity and clonal evolution in an aggressive PTC and matched metastases.

    Full text link
    The contribution of intratumor heterogeneity to thyroid metastatic cancers is still unknown. The clonal relations between the primary thyroid tumors and lymph nodes (LN) or distant metastases are also poorly understood. The objective of this study was to determine the phylogenetic relationships between matched primary thyroid tumor and metastases. We searched for non-synonymous single nucleotide variants (nsSNVs), gene fusions, alternative transcripts and loss of heterozygosity (LOH) by paired-end massively parallel sequencing of cDNA (RNA-Seq) in a patient diagnosed with an aggressive papillary thyroid cancer (PTC). Seven tumor samples of a stage IVc PTC patient were analyzed by RNA-Seq: two foci from the primary tumor, four foci from two LN metastases and one focus from a pleural metastasis. A large panel of other thyroid tumors was used for Sanger sequencing screening. We identified seven new nsSNVs. Some of these were early events clonally present in both the primary PTC and the three matched metastases. Other nsSNVs were private to the primary tumor, the LN metastases and/or the pleural metastasis. Three new gene fusions were identified. A novel cancer-specific KAZN alternative transcript was detected in this aggressive PTC and in dozens of additional thyroid tumors. The pleural metastasis harbored an exclusive whole chromosome 19 LOH. We presented the first deep sequencing study comparing the mutational spectra in a PTC and both LN and distant metastases. This study provides novel findings concerning intra-tumor heterogeneity, clonal evolution and metastases dissemination in thyroid cancer.SCOPUS: ar.jinfo:eu-repo/semantics/publishe
    corecore