393 research outputs found

    Comprehensive analysis of methylation data in non-model plant species

    Get PDF
    One of the goals of plant epigenetics is detecting differential methylation that may occur following specific treatments or in variable environments. This can be achieved with a single-base resolution with standard methods for whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS). Another important goal is to exploit sequencing methods in combination with bisulfite treatment to associate genetics and epigenetics with phenotypic traits. In the past 19 years, this has become possible using so-called genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS), the latter of which aims to reveal the potential biomarkers between phenotypic traits and epigenetic variation. In practice, such studies rely on software packages or “bioinformatics pipelines” which make the requisite computational processes routine and reliable. This thesis describes several such pipelines, developed within the framework of EpiDiverse, an Innovative Training Network (ITN) (https://epidiverse.eu/, accessed on 1 May 2021) carrying out comprehensive studies on pipelines for WGBS, differentially methylated region (DMR), EWAS, and single nucleotide polymorphism (SNP) analyses. Here I introduce the benchmark study with DMR tools, the EWAS pipeline, and bioinformatics pipelines implemented within the EpiDiverse toolkit. At first, by analyzing DMR tools with simulated datasets with seven different tools (metilene, methylKit, MOABS, DMRcate, Defiant, BSmooth, MethylSig) and four plant species (Aethionema arabicum, Arabidopsis thaliana, Picea abies, and Physcomitrium patens), together with the coauthors, we showed that metilene has a superior performance in terms of overall precision and recall. Therefore, we set it as a default DMR caller in the EpiDiverse DMR pipeline. Afterward, I introduced extended features of the EWAS pipeline beyond the GEM R package e.g., graphical outputs, novel missing data imputation, compatibility with new input types, etc. Then I revealed the effect of missing data with the Picea abies (Norway spruce) data and showed the pipeline presents logical missing data imputation. Furthermore, I obtained a significant overlap between the pipeline and Quercus lobata (valley oak) analysis results. By extensive benchmark with various tools, a group of pipelines became publicly available, whereby the EpiDiverse toolkit suits for people working with WGBS datasets (https://github.com/EpiDiverse, accessed on 1 May 2021)

    Advancing the analysis of bisulfite sequencing data in its application to ecological plant epigenetics

    Get PDF
    The aim of this thesis is to bridge the gap between the state-of-the-art bioinformatic tools and resources, currently at the forefront of epigenetic analysis, and their emerging applications to non-model species in the context of plant ecology. New, high-resolution research tools are presented; first in a specific sense, by providing new genomic resources for a selected non-model plant species, and also in a broader sense, by developing new software pipelines to streamline the analysis of bisulfite sequencing data, in a manner which is applicable to a wide range of non-model plant species. The selected species is the annual field pennycress, Thlaspi arvense, which belongs in the same lineage of the Brassicaceae as the closely-related model species, Arabidopsis thaliana, and yet does not benefit from such extensive genomic resources. It is one of three key species in a Europe-wide initiative to understand how epigenetic mechanisms contribute to natural variation, stress responses and long-term adaptation of plants. To this end, this thesis provides a high-quality, chromosome-level assembly for T. arvense, alongside a rich complement of feature annotations of particular relevance to the study of epigenetics. The genome assembly encompasses a hybrid approach, involving both PacBio continuous long reads and circular consensus sequences, alongside Hi-C sequencing, PCR-free Illumina sequencing and genetic maps. The result is a significant improvement in contiguity over the existing draft state from earlier studies. Much of the basis for building an understanding of epigenetic mechanisms in non-model species centres around the study of DNA methylation, and in particular the analysis of bisulfite sequencing data to bring methylation patterns into nucleotide-level resolution. In order to maintain a broad level of comparison between T. arvense and the other selected species under the same initiative, a suite of software pipelines which include mapping, the quantification of methylation values, differential methylation between groups, and epigenome-wide association studies, have also been developed. Furthermore, presented herein is a novel algorithm which can facilitate accurate variant calling from bisulfite sequencing data using conventional approaches, such as FreeBayes or Genome Analysis ToolKit (GATK), which until now was feasible only with specifically-adapted software. This enables researchers to obtain high-quality genetic variants, often essential for contextualising the results of epigenetic experiments, without the need for additional sequencing libraries alongside. Each of these aspects are thoroughly benchmarked, integrated to a robust workflow management system, and adhere to the principles of FAIR (Findability, Accessibility, Interoperability and Reusability). Finally, further consideration is given to the unique difficulties presented by population-scale data, and a number of concepts and ideas are explored in order to improve the feasibility of such analyses. In summary, this thesis introduces new high-resolution tools to facilitate the analysis of epigenetic mechanisms, specifically relating to DNA methylation, in non-model plant data. In addition, thorough benchmarking standards are applied, showcasing the range of technical considerations which are of principal importance when developing new pipelines and tools for the analysis of bisulfite sequencing data. The complete “Epidiverse Toolkit” is available at https://github.com/EpiDiverse and will continue to be updated and improved in the future.:ABSTRACT ACKNOWLEDGEMENTS 1 INTRODUCTION 1.1 ABOUT THIS WORK 1.2 BIOLOGICAL BACKGROUND 1.2.1 Epigenetics in plant ecology 1.2.2 DNA methylation 1.2.3 Maintenance of 5mC patterns in plants 1.2.4 Distribution of 5mC patterns in plants 1.3 TECHNICAL BACKGROUND 1.3.1 DNA sequencing 1.3.2 The case for a high-quality genome assembly 1.3.3 Sequence alignment for NGS 1.3.4 Variant calling approaches 2 BUILDING A SUITABLE REFERENCE GENOME 2.1 INTRODUCTION 2.2 MATERIALS AND METHODS 2.2.1 Seeds for the reference genome development 2.2.2 Sample collection, library preparation, and DNA sequencing 2.2.3 Contig assembly and initial scaffolding 2.2.4 Re-scaffolding 2.2.5 Comparative genomics 2.3 RESULTS 2.3.1 An improved reference genome sequence 2.3.2 Comparative genomics 2.4 DISCUSSION 3 FEATURE ANNOTATION FOR EPIGENOMICS 3.1 INTRODUCTION 3.2 MATERIALS AND METHODS 3.2.1 Tissue preparation for RNA sequencing 3.2.2 RNA extraction and sequencing 3.2.3 Transcriptome assembly 3.2.4 Genome annotation 3.2.5 Transposable element annotations 3.2.6 Small RNA annotations 3.2.7 Expression atlas 3.2.8 DNA methylation 3.3 RESULTS 3.3.1 Transcriptome assembly 3.3.2 Protein-coding genes 3.3.3 Non-coding loci 3.3.4 Transposable elements 3.3.5 Small RNA 3.3.6 Pseudogenes 3.3.7 Gene expression atlas 3.3.8 DNA Methylation 3.4 DISCUSSION 4 BISULFITE SEQUENCING METHODS 4.1 INTRODUCTION 4.2 PRINCIPLES OF BISULFITE SEQUENCING 4.3 EXPERIMENTAL DESIGN 4.4 LIBRARY PREPARATION 4.4.1 Whole Genome Bisulfite Sequencing (WGBS) 4.4.2 Reduced Representation Bisulfite Sequencing (RRBS) 4.4.3 Target capture bisulfite sequencing 4.5 BIOINFORMATIC ANALYSIS OF BISULFITE DATA 4.5.1 Quality Control 4.5.2 Read Alignment 4.5.3 Methylation Calling 4.6 ALTERNATIVE METHODS 5 FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS 5.1 INTRODUCTION 5.2 MATERIALS AND METHODS 5.2.1 Reference species 5.2.2 Natural accessions 5.2.3 Read simulation 5.2.4 Read alignment 5.2.5 Mapping rates 5.2.6 Precision-recall 5.2.7 Coverage deviation 5.2.8 DNA methylation analysis 5.3 RESULTS 5.4 DISCUSSION 5.5 A PIPELINE FOR WGBS ANALYSIS 6 THERE AND BACK AGAIN: INFERRING GENOMIC INFORMATION 6.1 INTRODUCTION 6.1.1 Implementing a new approach 6.2 MATERIALS AND METHODS 6.2.1 Validation datasets 6.2.2 Read processing and alignment 6.2.3 Variant calling 6.2.4 Benchmarking 6.3 RESULTS 6.4 DISCUSSION 6.5 A PIPELINE FOR SNP VARIANT ANALYSIS 7 POPULATION-LEVEL EPIGENOMICS 7.1 INTRODUCTION 7.2 CHALLENGES IN POPULATION-LEVEL EPIGENOMICS 7.3 DIFFERENTIAL METHYLATION 7.3.1 A pipeline for case/control DMRs 7.3.2 A pipeline for population-level DMRs 7.4 EPIGENOME-WIDE ASSOCIATION STUDIES (EWAS) 7.4.1 A pipeline for EWAS analysis 7.5 GENOTYPING-BY-SEQUENCING (EPIGBS) 7.5.1 Extending the epiGBS pipeline 7.6 POPULATION-LEVEL HAPLOTYPES 7.6.1 Extending the EpiDiverse/SNP pipeline 8 CONCLUSION APPENDICES A. SUPPLEMENT: BUILDING A SUITABLE REFERENCE GENOME B. SUPPLEMENT: FEATURE ANNOTATION FOR EPIGENOMICS C. SUPPLEMENT: FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS D. SUPPLEMENT: INFERRING GENOMIC INFORMATION BIBLIOGRAPH

    CRISPR/Cas-mediated editing of cis-regulatory elements for crop improvement

    Get PDF
    To improve future agricultural production, major technological advances are required to increase crop production and yield. Targeting the coding region of genes via the Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated Protein (CRISPR/Cas) system has been well established and has enabled the rapid generation of transgene-free plants, which can lead to crop improvement. The emergence of the CRISPR/Cas system has also enabled scientists to achieve cis-regulatory element (CRE) editing and, consequently, engineering endogenous critical CREs to modulate the expression of target genes. Recent genome-wide association studies have identified the domestication of natural CRE variants to regulate complex agronomic quantitative traits and have allowed for their engineering via the CRISPR/Cas system. Although engineering plant CREs can be advantageous to drive gene expression, there are still many limitations to its practical application. Here, we review the current progress in CRE editing and propose future strategies to effectively target CREs for transcriptional regulation for crop improvement

    Identification and computational analysis of differential H3K27me3 targets between <em>Arabidopsis thaliana</em> accessions

    Get PDF
    Histone H3 lysine 27 trimethylation (H3K27me3) and lysine 9 dimethylation (H3K9me2) are two independent repressive chromatin modifications in Arabidopsis thaliana. H3K27me3 is established and maintained by Polycomb repressive complexes whereas H3K9me2 is catalyzed by histone methyltransferases SUVH(4-6). H3K27me3 mostly targets at protein coding genes in euchromatin which are reversible in repression. H3K9me2 mainly targets at transposons and repetitive sequences which should be constitutively silenced. Both marks can spread to flanking regions after initialization and they have been shown to be mutually exclusive in distribution in the Arabidopsis genome. In this study, the extent of natural variation of H3K27me3 in the two accessions of Arabidopsis thaliana, Landsberg erecta (Ler) and Columbia (Col), and their hybrids was analyzed using chromatin immunoprecipitation followed by microarray or sequencing analysis (ChIP-chip and ChIP-seq). A computational workflow was implemented that includes remapping of probes to the Col and Ler genome assemblies in order to exclude differential signals due to genome polymorphisms. The majority of genes that are H3K27me3 targets in Col are also targets in Ler and the F1 of reciprocal crosses. A small number of Ler-specific H3K27me3 targets were detected and well validated with independent ChIP-PCR whereas the Col-specific targets have not been confirmed so far. Ler-specific H3K27me3 targets showed an allele-specific H3K27me3 in both hybrids, consistent with a cis-regulatory mechanism for establishing H3K27me3. Five Ler-specific H3K27me3 targets were marked by H3K4me3 in Col. Consistent with the activation role of H3K4me3 during transcription, the differential H3K27me3 of the these five genes accords to the expression variation between the two accessions. For the majority of Ler-specific H3K27me3 targets, no expression could be detected in Col, Ler or 17 other Arabidopsis accessions. Instead of H3K27me3, the antagonistic mark H3K9me2 and other heterochromatic features were observed at these loci in Col. More frequently than expected, transposable elements were found neighboring these loci in Col, and in many cases these transposable elements are missing in the Ler genome assembly. We propose a model where a transposon insertion specific to Col results in recruitment of H3K9me2, which spreads to neighboring genes already in a repressed state through H3K27me3, resulting in Ler-specific H3K27me3 as the ancestral state

    Exploitation of epigenetic variation of crop wild relatives for crop improvement and agrobiodiversity preservation

    Get PDF
    Crop wild relatives (CWRs) are recognized as the best potential source of traits for crop improvement. However, successful crop improvement using CWR relies on identifying variation in genes controlling desired traits in plant germplasms and subsequently incorporating them into cultivars. Epigenetic diversity may provide an additional layer of variation within CWR and can contribute novel epialleles for key traits for crop improvement. There is emerging evidence that epigenetic variants of functional and/or agronomic importance exist in CWR gene pools. This provides a rationale for the conservation of epigenotypes of interest, thus contributing to agrobiodiversity preservation through conservation and (epi)genetic monitoring. Concepts and techniques of classical and modern breeding should consider integrating recent progress in epigenetics, initially by identifying their association with phenotypic variations and then by assessing their heritability and stability in subsequent generations. New tools available for epigenomic analysis offer the opportunity to capture epigenetic variation and integrate it into advanced (epi)breeding programmes. Advances in -omics have provided new insights into the sources and inheritance of epigenetic variation and enabled the efficient introduction of epi-traits from CWR into crops using epigenetic molecular markers, such as epiQTLs

    Perspectives on the Application of Next-generation Sequencing to the Improvement of Africa’s Staple Food Crops

    Get PDF
    The persistent challenge of insufficient food, unbalanced nutrition, and deteriorating natural resources in the most vulnerable nations, characterized by fast population growth, calls for utilization of innovative technologies to curb constraints of crop production. Enhancing genetic gain by using a multipronged approach that combines conventional and genomic technologies for the development of stress-tolerant varieties with high yield and nutritional quality is necessary. The advent of next-generation sequencing (NGS) technologies holds the potential to dramatically impact the crop improvement process. NGS enables whole-genome sequencing (WGS) and re-sequencing, transcriptome sequencing, metagenomics, as well as high-throughput genotyping, which can be applied for genome selection (GS). It can also be applied to diversity analysis, genetic and epigenetic characterization of germplasm and pathogen detection, identification, and elimination. High-throughput phenotyping, integrated data management, and decision support tools form the necessary supporting environment for effective utilization of genome sequence information. It is important that these opportunities for mainstreaming innovative breeding strategies, enabled by cutting-edge “Omics” technologies, are seized in Africa; however, several constraints must be addressed before the benefit of NGS can be fully realized. African breeding programs must have access to high-throughput genotyping facilities, capacity in the application of genome selection and marker-assisted breeding must be built and supported by capacity in genomic analysis and bioinformatics. This chapter demonstrates how interventions with NGS-enabled innovative strategies can be applied to increase genetic gain with insights from the Consortium of International Agricultural Research (CGIAR) in general and the International Institute of Tropical Agriculture (IITA) in particular

    Exploitation of epigenetic variation of crop wild relatives for crop improvement and agrobiodiversity preservation

    Get PDF
    Crop wild relatives (CWRs) are recognized as the best potential source of traits for crop improvement. However, successful crop improvement using CWR relies on identifying variation in genes controlling desired traits in plant germplasms and subsequently incorporating them into cultivars. Epigenetic diversity may provide an additional layer of variation within CWR and can contribute novel epialleles for key traits for crop improvement. There is emerging evidence that epigenetic variants of functional and/or agronomic importance exist in CWR gene pools. This provides a rationale for the conservation of epigenotypes of interest, thus contributing to agrobiodiversity preservation through conservation and (epi)genetic monitoring. Concepts and techniques of classical and modern breeding should consider integrating recent progress in epigenetics, initially by identifying their association with phenotypic variations and then by assessing their heritability and stability in subsequent generations. New tools available for epigenomic analysis ofer the opportunity to capture epigenetic variation and integrate it into advanced (epi)breeding programmes. Advances in -omics have provided new insights into the sources and inheritance of epigenetic variation and enabled the efcient introduction of epi-traits from CWR into crops using epigenetic molecular markers, such as epiQTLs

    Characterizing short-term evolution of DNA methylation in A. thaliana using next-generation sequencing

    Get PDF
    DNA sequence mutations are the principal source of natural variation. Over the last few decades, however, an increasing number of studies have suggested that also epigenetic components can be at the basis of differences in phenotypic traits. These epigenetic marks allow a flexible modulation of gene activity without changes in the DNA sequence. One of the most prominent epigenetic modifications is DNA methylation, which consists of cytosines that carry an additional methyl group. Such chemical marks can be inherited across cell divisions and generations, and there are many durable methylation differences between individuals, so-called epimutations. These can originate from mainly three different sources: most epimutations are coupled to genetic mutations, yet they can also arise spontaneously, or they can be induced by environmental stimuli. The latter case enables rapid adaptation to changing environments, which in the short term is usually not possible via genetic mutations. A current debate revolves around the question whether adaptive environmentally induced epimutations can be heritable, which would contradict the random mutagenesis assumption of Darwinian evolutionary theory. However, the experimental setup of most studies that have examined epigenetic variation did not allow the clear separation of different sources of variable methylation. These studies typically did not inspect genome-wide genetic variation, or did not monitor environmentally induced changes for more than one or two generations. Thus it has remained largely unresolved how frequently methylation differences arise spontaneously on the whole-genome level, and how strongly and durably environmental conditions impact the methylation landscape. This work addresses these questions in the model plant Arabidopsis thaliana. I present whole-genome DNA methylation analyses at base-pair resolution of two different populations, originating from unique experimental settings that largely eliminate specific sources of epimutations. Investigation of genetically quasi identical lines propagated for thirty generations in uniform greenhouse conditions – thus largely without genetic and environmental influences – revealed that spontaneously occurring epimutations emerged frequently, but seemed to be largely short-lived. Plants with minimal genetic divergence that had grown in diverse natural sites over a previously uncharted time period of over one hundred years exhibited a methylation pattern that was largely stable on the whole-genome level and that was in many aspects intriguingly similar to that of the greenhouse-grown lines. Thus, environmentally induced epimutations seem to be only minor contributors to heritable methylation differences, which challenges published claims of broad-scale inheritance of adaptive epigenetic variation. This thesis also provides technical and methodological advances of next-generation sequencing (NGS) data analysis. To gauge the genome-wide genetic influence on epimutations, this work provides an iterative workflow that maximizes the detection of a wide range of DNA sequence variants using short NGS reads by integrating several different genetic variation detection approaches. Finally, while previous epigenetic studies in plants, due to rather simplistic statistical testing, largely revealed a biased picture of differential methylation in the genome, this work introduces a comprehensive DNA methylation pipeline for NGS data that includes a novel approach to obtain more sensitive and more unbiased calls of differentially methylated regions. Together, this work presents advanced computational methods to profile genome-wide genetic and methylation variation, and inspects the rate and spectrum of naturally occurring methylation changes, thus contributing to elucidating the role of epimutations in evolution
    • …
    corecore