29 research outputs found

    Identifying and correcting epigenetics measurements for systematic sources of variation

    Get PDF
    Abstract Background Methylation measures quantified by microarray techniques can be affected by systematic variation due to the technical processing of samples, which may compromise the accuracy of the measurement process and contribute to bias the estimate of the association under investigation. The quantification of the contribution of the systematic source of variation is challenging in datasets characterized by hundreds of thousands of features. In this study, we introduce a method previously developed for the analysis of metabolomics data to evaluate the performance of existing normalizing techniques to correct for unwanted variation. Illumina Infinium HumanMethylation450K was used to acquire methylation levels in over 421,000 CpG sites for 902 study participants of a case-control study on breast cancer nested within the EPIC cohort. The principal component partial R-square (PC-PR2) analysis was used to identify and quantify the variability attributable to potential systematic sources of variation. Three correcting techniques, namely ComBat, surrogate variables analysis (SVA) and a linear regression model to compute residuals were applied. The impact of each correcting method on the association between smoking status and DNA methylation levels was evaluated, and results were compared with findings from a large meta-analysis. Results A sizeable proportion of systematic variability due to variables expressing ‘batch’ and ‘sample position’ within ‘chip’ was identified, with values of the partial R2 statistics equal to 9.5 and 11.4% of total variation, respectively. After application of ComBat or the residuals’ methods, the contribution was 1.3 and 0.2%, respectively. The SVA technique resulted in a reduced variability due to ‘batch’ (1.3%) and ‘sample position’ (0.6%), and in a diminished variability attributable to ‘chip’ within a batch (0.9%). After ComBat or the residuals’ corrections, a larger number of significant sites (k = 600 and k = 427, respectively) were associated to smoking status than the SVA correction (k = 96). Conclusions The three correction methods removed systematic variation in DNA methylation data, as assessed by the PC-PR2, which lent itself as a useful tool to explore variability in large dimension data. SVA produced more conservative findings than ComBat in the association between smoking and DNA methylation

    Association of selenoprotein and selenium pathway gnotypes with risk of colorectal cancer and interaction with selenium status

    Get PDF
    Selenoprotein genetic variations and suboptimal selenium (Se) levels may contribute to the risk of colorectal cancer (CRC) development. We examined the association between CRC risk and genotype for single nucleotide polymorphisms (SNPs) in selenoprotein and Se metabolic pathway genes. Illumina Goldengate assays were designed and resulted in the genotyping of 1040 variants in 154 genes from 1420 cases and 1421 controls within the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Multivariable logistic regression revealed an association of 144 individual SNPs from 63 Se pathway genes with CRC risk. However, regarding the selenoprotein genes, only TXNRD1 rs11111979 retained borderline statistical significance after adjustment for correlated tests (PACT = 0.10; PACT significance threshold was P < 0.1). SNPs in Wingless/Integrated (Wnt) and Transforming growth factor (TGF) beta-signaling genes (FRZB, SMAD3, SMAD7) from pathways affected by Se intake were also associated with CRC risk after multiple testing adjustments. Interactions with Se status (using existing serum Se and Selenoprotein P data) were tested at the SNP, gene, and pathway levels. Pathway analyses using the modified Adaptive Rank Truncated Product method suggested that genes and gene x Se status interactions in antioxidant, apoptosis, and TGF-beta signaling pathways may be associated with CRC risk. This study suggests that SNPs in the Se pathway alone or in combination with suboptimal Se status may contribute to CRC development

    Association of Selenoprotein and Selenium Pathway Genotypes with Risk of Colorectal Cancer and Interaction with Selenium Status

    Get PDF
    Selenoprotein genetic variations and suboptimal selenium (Se) levels may contribute to the risk of colorectal cancer (CRC) development. We examined the association between CRC risk and genotype for single nucleotide polymorphisms (SNPs) in selenoprotein and Se metabolic pathway genes. Illumina Goldengate assays were designed and resulted in the genotyping of 1040 variants in 154 genes from 1420 cases and 1421 controls within the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Multivariable logistic regression revealed an association of 144 individual SNPs from 63 Se pathway genes with CRC risk. However, regarding the selenoprotein genes, only TXNRD1 rs11111979 retained borderline statistical significance after adjustment for correlated tests (P-ACT = 0.10; P-ACT significance threshold was P <0.1). SNPs in Wingless/Integrated (Wnt) and Transforming growth factor (TGF) beta-signaling genes (FRZB, SMAD3, SMAD7) from pathways affected by Se intake were also associated with CRC risk after multiple testing adjustments. Interactions with Se status (using existing serum Se and Selenoprotein P data) were tested at the SNP, gene, and pathway levels. Pathway analyses using the modified Adaptive Rank Truncated Product method suggested that genes and gene x Se status interactions in antioxidant, apoptosis, and TGF-beta signaling pathways may be associated with CRC risk. This study suggests that SNPs in the Se pathway alone or in combination with suboptimal Se status may contribute to CRC development.Peer reviewe

    Novel Common Genetic Susceptibility Loci for Colorectal Cancer

    Get PDF
    BACKGROUND: Previous genome-wide association studies (GWAS) have identified 42 loci (P < 5 × 10-8) associated with risk of colorectal cancer (CRC). Expanded consortium efforts facilitating the discovery of additional susceptibility loci may capture unexplained familial risk. METHODS: We conducted a GWAS in European descent CRC cases and control subjects using a discovery-replication design, followed by examination of novel findings in a multiethnic sample (cumulative n = 163 315). In the discovery stage (36 948 case subjects/30 864 control subjects), we identified genetic variants with a minor allele frequency of 1% or greater associated with risk of CRC using logistic regression followed by a fixed-effects inverse variance weighted meta-analysis. All novel independent variants reaching genome-wide statistical significance (two-sided P < 5 × 10-8) were tested for replication in separate European ancestry samples (12 952 case subjects/48 383 control subjects). Next, we examined the generalizability of discovered variants in East Asians, African Americans, and Hispanics (12 085 case subjects/22 083 control subjects). Finally, we examined the contributions of novel risk variants to familial relative risk and examined the prediction capabilities of a polygenic risk score. All statistical tests were two-sided. RESULTS: The discovery GWAS identified 11 variants associated with CRC at P < 5 × 10-8, of which nine (at 4q22.2/5p15.33/5p13.1/6p21.31/6p12.1/10q11.23/12q24.21/16q24.1/20q13.13) independently replicated at a P value of less than .05. Multiethnic follow-up supported the generalizability of discovery findings. These results demonstrated a 14.7% increase in familial relative risk explained by common risk alleles from 10.3% (95% confidence interval [CI] = 7.9% to 13.7%; known variants) to 11.9% (95% CI = 9.2% to 15.5%; known and novel variants). A polygenic risk score identified 4.3% of the population at an odds ratio for developing CRC of at least 2.0. CONCLUSIONS: This study provides insight into the architecture of common genetic variation contributing to CRC etiology and improves risk prediction for individualized screenin

    utilisation des signatures génomiques et épigenomiques dans le but d’identifier des marqueurs d’expositions exogènes et d’évaluer leur rôle dans l’étiologie du cancer

    No full text
    Context and aim: Several risks factors have been identified for cancer, and it has been estimated that more than 40% of cases in developed countries are preventable through the modulation of known modifiable risk factors. The overall objective of this thesis was to demonstrate that the analysis of genomic and epigenomic data integrated with well-characterised exposure and lifestyle data may be used to identify markers of environmental exposures and lifestyle and may contribute to increase our understanding of cancer aetiology.Results: We first describe how genomic and epigenomic signatures can be used to identify markers of exposure and decipher the aetiology of cancer. Then, we adopt the mutational signatures framework to contribute to the debate about the “bad luck” hypothesis for cancer and demonstrate that tobacco-related mutations are more strongly correlated with cancer risk than random mutations. We introduce a probabilistic model for the simulation of mutational signature data and compare the performance of the available methods for the identification of mutational signatures using both simulated and real data. Additionally, we introduce a new method for the identification of such signatures. Finally, we use methylation array data in an epidemiological study within the E3N cohort to investigate the association between exposure to Brominated Flame Retardants and Per- and polyfluoroalkyl substances, two organic pollutants that are known endocrine disrupting chemicals, and methylation in DNA from blood. Overall, our study does not provide evidence of methylation alterations at the level of the whole genome, in regions or in single CpGs. Suggestive evidence of alterations in the methylation of genes within plausible biological pathways (e.g. androgen response) warrants further investigations. Conclusion: Our work on the methodological aspects of mutational signature research introduces an original framework for measuring the performance of tools for the identification of mutational signatures that may serve as reference for future methodological or applied research. Our applications of both mutational signature and methylome research demonstrate the usefulness of such tools to assess exposures and elucidate their role in cancer aetiology.Contexte et objectif : Plusieurs facteurs de risque de cancer ont été identifiés et il a été estimé que plus de 40% des cas dans les pays développés pourraient être évités en modifiant les facteurs de risque connus. L'objectif général de cette thèse était de démontrer que l’intégration de données génomiques et épigénomiques aux données détaillées sur les expositions environnementales et le mode de vie peut être utile pour identifier des biomarqueurs de ces facteurs et contribuer à augmenter notre connaissance de l'étiologie du cancer. Résultats : Dans un premier temps, nous décrivons comment les signatures génomiques et épigénomiques peuvent être utilisées pour identifier des marqueurs d’exposition et déchiffrer l’étiologie du cancer. Ensuite, nous contribuons au débat relatif à l’hypothèse de la chance dans le développement du cancer et démontrons que les mutations induites par le tabagisme sont plus prédictives du risque de cancer que les mutations aléatoires. Nous introduisons un modèle probabiliste pour la simulation de données mutationnelles et comparons la performance des outils d’identification de ces signatures avec des données réelles et simulées. De plus, nous introduisons une nouvelle méthode pour l’identification des signatures mutationnelles. Enfin, nous utilisons les données de méthylation de la cohorte E3N pour étudier le lien entre l'exposition aux retardateurs de flamme bromés et aux composés perfluorés, deux substances classées parmi les perturbateurs endocriniens, et la méthylation de l’ADN sanguin. Globalement, notre étude ne fournit aucune preuve d'altérations globales du méthylome ou d'altérations à l’échelle des CpGs. Cependant, certains résultats suggèrent l’existence d'altérations de la méthylation de gènes impliqués dans des voies biologiques (ex., la réponse aux androgènes) et nécessitent des recherches supplémentaires.Conclusion : Ce travail contribue à la recherche méthodologique portant sur les signatures mutationnelles en introduisant un protocole de mesure de performance et d’identification des signatures mutationnelles pouvant servir de référence à de futures études méthodologiques ou appliquées. Nos recherches sur les signatures mutationnelles et le méthylome démontrent l'utilité de tels outils pour évaluer les expositions et élucider leur rôle dans l'étiologie du cancer

    Applications of Genomic and Epigenomic Signatures to Identify Markers of Exogenous Exposures and Elucidate their Potential Role in Cancer Aetiology

    No full text
    Contexte et objectif : Plusieurs facteurs de risque de cancer ont été identifiés et il a été estimé que plus de 40% des cas dans les pays développés pourraient être évités en modifiant les facteurs de risque connus. L'objectif général de cette thèse était de démontrer que l’intégration de données génomiques et épigénomiques aux données détaillées sur les expositions environnementales et le mode de vie peut être utile pour identifier des biomarqueurs de ces facteurs et contribuer à augmenter notre connaissance de l'étiologie du cancer. Résultats : Dans un premier temps, nous décrivons comment les signatures génomiques et épigénomiques peuvent être utilisées pour identifier des marqueurs d’exposition et déchiffrer l’étiologie du cancer. Ensuite, nous contribuons au débat relatif à l’hypothèse de la chance dans le développement du cancer et démontrons que les mutations induites par le tabagisme sont plus prédictives du risque de cancer que les mutations aléatoires. Nous introduisons un modèle probabiliste pour la simulation de données mutationnelles et comparons la performance des outils d’identification de ces signatures avec des données réelles et simulées. De plus, nous introduisons une nouvelle méthode pour l’identification des signatures mutationnelles. Enfin, nous utilisons les données de méthylation de la cohorte E3N pour étudier le lien entre l'exposition aux retardateurs de flamme bromés et aux composés perfluorés, deux substances classées parmi les perturbateurs endocriniens, et la méthylation de l’ADN sanguin. Globalement, notre étude ne fournit aucune preuve d'altérations globales du méthylome ou d'altérations à l’échelle des CpGs. Cependant, certains résultats suggèrent l’existence d'altérations de la méthylation de gènes impliqués dans des voies biologiques (ex., la réponse aux androgènes) et nécessitent des recherches supplémentaires.Conclusion : Ce travail contribue à la recherche méthodologique portant sur les signatures mutationnelles en introduisant un protocole de mesure de performance et d’identification des signatures mutationnelles pouvant servir de référence à de futures études méthodologiques ou appliquées. Nos recherches sur les signatures mutationnelles et le méthylome démontrent l'utilité de tels outils pour évaluer les expositions et élucider leur rôle dans l'étiologie du cancer.Context and aim: Several risks factors have been identified for cancer, and it has been estimated that more than 40% of cases in developed countries are preventable through the modulation of known modifiable risk factors. The overall objective of this thesis was to demonstrate that the analysis of genomic and epigenomic data integrated with well-characterised exposure and lifestyle data may be used to identify markers of environmental exposures and lifestyle and may contribute to increase our understanding of cancer aetiology.Results: We first describe how genomic and epigenomic signatures can be used to identify markers of exposure and decipher the aetiology of cancer. Then, we adopt the mutational signatures framework to contribute to the debate about the “bad luck” hypothesis for cancer and demonstrate that tobacco-related mutations are more strongly correlated with cancer risk than random mutations. We introduce a probabilistic model for the simulation of mutational signature data and compare the performance of the available methods for the identification of mutational signatures using both simulated and real data. Additionally, we introduce a new method for the identification of such signatures. Finally, we use methylation array data in an epidemiological study within the E3N cohort to investigate the association between exposure to Brominated Flame Retardants and Per- and polyfluoroalkyl substances, two organic pollutants that are known endocrine disrupting chemicals, and methylation in DNA from blood. Overall, our study does not provide evidence of methylation alterations at the level of the whole genome, in regions or in single CpGs. Suggestive evidence of alterations in the methylation of genes within plausible biological pathways (e.g. androgen response) warrants further investigations. Conclusion: Our work on the methodological aspects of mutational signature research introduces an original framework for measuring the performance of tools for the identification of mutational signatures that may serve as reference for future methodological or applied research. Our applications of both mutational signature and methylome research demonstrate the usefulness of such tools to assess exposures and elucidate their role in cancer aetiology

    Computational tools to detect signatures of mutational processes in DNA from tumours: A review and empirical comparison of performance

    Get PDF
    International audienceMutational signatures refer to patterns in the occurrence of somatic mutations that might be uniquely ascribed to particular mutational process. Tumour mutation catalogues can reveal mutational signatures but are often consistent with the mutation spectra produced by a variety of mutagens. To date, after the analysis of tens of thousands of exomes and genomes from about 40 different cancer types, tens of mutational signatures characterized by a unique probability profile across the 96 trinucleotide-based mutation types have been identified , validated and catalogued. At the same time, several concurrent methods have been developed for either the quantification of the contribution of catalogued signatures in a given cancer sequence or the identification of new signatures from a sample of cancer sequences. A review of existing computational tools has been recently published to guide researchers and practitioners through their mutational signature analyses, but other tools have been introduced since its publication and, a systematic evaluation and comparison of the performance of such tools is still lacking. In order to fill this gap, we have carried out an empirical evaluation of the main packages available to date, using both real and simulated data. Among other results, our empirical study shows that the identification of signatures is more difficult for cancers characterized by multiple signatures each having a small contribution. This work suggests that detection methods based on probabilistic models, especially EMu and bayesNMF, have in general better performance than NMF-based methods
    corecore