27 research outputs found

    Leveraging epigenome-based subgroups to uncover mechanism of Tumorigenesis

    No full text
    Current disease classification is coarse and would benefit from unsupervised, biochemical data driven approaches. Thus, challenging the top-down, histopathological classification of tumors, we suggest that DNA methylation profiling is a relevant tool to classify tumors into relevant groups, according to their epigenome or cancer state. In turn, subgroups identified from this bottom-up, data-driven approach should share related genetic alterations, such as mutations in genes involved in the same pathway. In this thesis, we show the validity of using this approach by recuperating groups of patients that share common genetic alterations. First, we show that unsupervised clustering of DNA methylation data segregates ETMRs, which shared a common gene fusion, from other related tumors. This was consistent in pediatric glioblastoma samples, in which samples that clustered together shared the same driver mutation. Next, we used the same approach and grouped samples by their DNA methylation profile, including patients with unknown genetic driver event. Uncharacterized samples that grouped with samples with known drivers were further investigated, focusing on genes involved in the same pathways as the known drivers. This guided strategy uncovered novel oncogenic mutations in Giant Cell Tumor of the bone (GCTs) and a new Head & Neck Squamous Cell Carcinoma (HNSCC) entity, enriched for specific oncogenic mutations. Finally, we leveraged results from this approach to identify H3K36me2 as key player of the H3K36 methylation pathway and its role in cancer. Profiling this understudied epigenomic mark allowed us to uncover its role in establishing DNA methylation. This thesis describes examples of using DNA methylation, unsupervised clustering and an integrative genomics approach in different cancers, in an attempt to highlight the utility of data-driven cancer diagnosis and research.La classification contemporaine des maladies manque de nuance, et bénéficierait d'approches non biaisées, fondées sur les données biomédicales des patients. Dès lors, et tout en remettant en cause la classification histo-pathologique « par le haut » des tumeurs, cette thèse propose une approche « par le bas », utilisant la méthylation de l'ADN comme base afin d'assigner les tumeurs à des groupes bio-médicalement pertinents, selon leur état épigénomique. Nous postulons que ces groupes, définis par leurs caractéristiques épigénomiques, devraient être composés d'échantillons ayant des altérations génétiques connexes, tel que des mutations dans des gènes impliqués dans les mêmes voies de signalisation moléculaire. Cette thèse démontre la validité de cette approche, qui permet effectivement d'identifier des groupes de patients qui partagent des altérations génétiques analogues. Premièrement, nous démontrons que l'application d'un algorithme de regroupement non-supervisé permet de distinguer clairement les ETMRs, un sous-groupe de tumeurs qui sont définies par une fusion de gènes caractéristique. Le même résultat est obtenu chez les glioblastomes pédiatriques, où les échantillons ayant la même altération génétique se regroupent ensemble. Ensuite, nous utilisons la même approche de regroupement non-supervisé sur les données de méthylation sur un ensemble tumeurs contenant notamment des échantillons sans mutation causale identifiée. Nous avons ensuite profilé les échantillons non-caractérisés se trouvant assignés à des sous-groupes contenant des échantillons caractérisés, en focalisant notre analyse sur les gènes impliqués dans les mêmes voies de signalisation moléculaire que les gènes contenant des mutations pilotes connues. Cette stratégie d'analyse canalisée nous a permis de découvrir de nouvelles mutations pilotes dans le GTC, ainsi qu'un nouveau sous-groupe de -HNSCC, enrichi en mutations oncogéniques spécifiques. Enfin nous avons mobilisé les résultats de cette approche pour identifier H3K36me2 comme une mutation clé dans la voie de méthylation de H3K36 et avons démontré son rôle dans l'oncogénèse. Cette thèse, focalisée sur l'exemple du cancer, détaille plusieurs exemples qui démontrent l'utilité d'une méthode novatrice d'analyse fondée sur la méthylation de l'ADN, des algorithmes de classification non-supervisée, et des techniques de génomique intégrée. A travers ces exemples, ces travaux mettent en lumière l'utilité primordiale des recherches et diagnostiques fondées sur des approches guidées par les données

    mRMRe: an R package for parallelized mRMR ensemble feature selection

    No full text
    Motivation: Feature selection is one of the main challenges in analyzing high-throughput genomic data. Minimum redundancy maximum relevance (mRMR) is a particularly fast feature selection method for finding a set of both relevant and complementary features. Here we describe the mRMRe R package, in which the mRMR technique is extended by using an ensemble approach in order to better explore the feature space and build more robust predictors. To deal with the computational complexity of the ensemble approach the main functions of the package are implemented and parallelized in C using the openMP API.Results: Our ensemble mRMR implementations outperform the classical mRMR approach in terms of prediction accuracy. They identify genes more relevant to the biological context and may lead to richer biological interpretations. The parallelized functions included in the package show significant gains in terms of run-time speed when compared to previously released packages.Availability: The R package mRMRe is available on CRAN and is provided open source under the Artistic-2.0 License. The code used to generate all the results reported in this application note is available from Supplementary File 1.Contact: [email protected] Information: Supplementary information is available at Bioinformatics online.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Identification of FAT3 as a new candidate gene for adolescent idiopathic scoliosis

    No full text
    Abstract In an effort to identify rare alleles associated with adolescent idiopathic scoliosis (AIS) whole-exome sequencing was performed on a discovery cohort of 73 unrelated patients and 70 age-and sex matched controls, all of French-Canadian ancestry. A collapsing gene burden test was performed to analyze rare protein-altering variants using case–control statistics. Since no single gene achieved statistical significance, targeted exon sequencing was performed for 24 genes with the smallest p values, in an independent replication cohort of unrelated severely affected females with AIS and sex-matched controls (N = 96 each). An excess of rare, potentially protein-altering variants was noted in one particular gene, FAT3, although it did not achieve statistical significance. Independently, we sequenced the exomes of all members of a rare multiplex family of three affected sisters and unaffected parents. All three sisters were compound heterozygous for two rare protein-altering variants in FAT3. The parents were single heterozygotes for each variant. The two variants in the family were also present in our discovery cohort. A second validation step was done, using another independent replication cohort of 258 unrelated AIS patients having reach their skeletal maturity and 143 healthy controls to genotype nine FAT3 gene variants, including the two variants previously identified in the multiplex family: p.L517S (rs139595720) and p.L4544F (rs187159256). Interestingly, two FAT3 variants, rs139595720 (genotype A/G) and rs80293525 (genotype C/T), were enriched in severe scoliosis cases (4.5% and 2.7% respectively) compared to milder cases (1.4% and 0.7%) and healthy controls (1.6% and 0.8%). Our results implicate FAT3 as a new candidate gene in the etiology of AIS

    Comparison and validation of genomic predictors for anticancer drug sensitivity

    No full text
    An enduring challenge in personalized medicine lies in selecting the right drug for each individual patient. While testing of drugs on patients in large trials is the only way to assess their clinical efficacy and toxicity, we dramatically lack resources to test the hundreds of drugs currently under development. Therefore the use of preclinical model systems has been intensively investigated as this approach enables response to hundreds of drugs to be tested in multiple cell lines in parallel.Two large-scale pharmacogenomic studies recently screened multiple anticancer drugs on over 1000 cell lines. We propose to combine these datasets to build and robustly validate genomic predictors of drug response. We compared five different approaches for building predictors of increasing complexity. We assessed their performance in cross-validation and in two large validation sets, one containing the same cell lines present in the training set and another dataset composed of cell lines that have never been used during the training phase.Sixteen drugs were found in common between the datasets. We were able to validate multivariate predictors for three out of the 16 tested drugs, namely irinotecan, PD-0325901, and PLX4720. Moreover, we observed that response to 17-AAG, an inhibitor of Hsp90, could be efficiently predicted by the expression level of a single gene, NQO1.These results suggest that genomic predictors could be robustly validated for specific drugs. If successfully validated in patients' tumor cells, and subsequently in clinical trials, they could act as companion tests for the corresponding drugs and play an important role in personalized medicine

    Comparison and validation of genomic predictors for anticancer drug sensitivity

    No full text
    BACKGROUND: An enduring challenge in personalized medicine lies in selecting the right drug for each individual patient. While testing of drugs on patients in large trials is the only way to assess their clinical efficacy and toxicity, we dramatically lack resources to test the hundreds of drugs currently under development. Therefore the use of preclinical model systems has been intensively investigated as this approach enables response to hundreds of drugs to be tested in multiple cell lines in parallel. METHODS: Two large-scale pharmacogenomic studies recently screened multiple anticancer drugs on over 1000 cell lines. We propose to combine these datasets to build and robustly validate genomic predictors of drug response. We compared five different approaches for building predictors of increasing complexity. We assessed their performance in cross-validation and in two large validation sets, one containing the same cell lines present in the training set and another dataset composed of cell lines that have never been used during the training phase. RESULTS: Sixteen drugs were found in common between the datasets. We were able to validate multivariate predictors for three out of the 16 tested drugs, namely irinotecan, PD-0325901, and PLX4720. Moreover, we observed that response to 17-AAG, an inhibitor of Hsp90, could be efficiently predicted by the expression level of a single gene, NQO1. CONCLUSION: These results suggest that genomic predictors could be robustly validated for specific drugs. If successfully validated in patients’ tumor cells, and subsequently in clinical trials, they could act as companion tests for the corresponding drugs and play an important role in personalized medicine
    corecore