69 research outputs found

    The collapse of gene complement following whole genome duplication

    Get PDF
    Abstract Background Genome amplification through duplication or proliferation of transposable elements has its counterpart in genome reduction, by elimination of DNA or by gene inactivation. Whether loss is primarily due to excision of random length DNA fragments or the inactivation of one gene at a time is controversial. Reduction after whole genome duplication (WGD) represents an inexorable collapse in gene complement. Results We compare fifteen genomes descending from six eukaryotic WGD events 20-450 Mya. We characterize the collapse over time through the distribution of runs of reduced paralog pairs in duplicated segments. Descendant genomes of the same WGD event behave as replicates. Choice of paralog pairs to be reduced is random except for some resistant regions of contiguous pairs. For those paralog pairs that are reduced, conserved copies tend to concentrate on one chromosome. Conclusions Both the contiguous regions of reduction-resistant pairs and the concentration of runs of single copy genes on a single chromosome are evidence of transcriptional co-regulation, dosage sensitivity or other functional interaction constraining the reduction process. These constraints and their evolution over time show a consistent pattern across evolutionary domains and a highly reproducible pattern, as replicates, for the several descendants of a single WGD

    Dynamics and Adaptive Benefits of Protein Domain Emergence and Arrangements during Plant Genome Evolution

    Get PDF
    Plant genomes are generally very large, mostly paleopolyploid, and have numerous gene duplicates and complex genomic features such as repeats and transposable elements. Many of these features have been hypothesized to enable plants, which cannot easily escape environmental challenges, to rapidly adapt. Another mechanism, which has recently been well described as a major facilitator of rapid adaptation in bacteria, animals, and fungi but not yet for plants, is modular rearrangement of protein-coding genes. Due to the high precision of profile-based methods, rearrangements can be well captured at the protein level by characterizing the emergence, loss, and rearrangements of protein domains, their structural, functional, and evolutionary building blocks. Here, we study the dynamics of domain rearrangements and explore their adaptive benefit in 27 plant and 3 algal genomes. We use a phylogenomic approach by which we can explain the formation of 88% of all arrangements by single-step events, such as fusion, fission, and terminal loss of domains. We find many domains are lost along every lineage, but at least 500 domains are novel, that is, they are unique to green plants and emerged more or less recently. These novel domains duplicate and rearrange more readily within their genomes than ancient domains and are overproportionally involved in stress response and developmental innovations. Novel domains more often affect regulatory proteins and show a higher degree of structural disorder than ancient domains. Whereas a relatively large and well-conserved core set of single-domain proteins exists, long multi-domain arrangements tend to be species-specific. We find that duplicated genes are more often involved in rearrangements. Although fission events typically impact metabolic proteins, fusion events often create new signaling proteins essential for environmental sensing. Taken together, the high volatility of single domains and complex arrangements in plant genomes demonstrate the importance of modularity for environmental adaptability of plants

    The evolutionary significance of gene and genome duplications

    Get PDF

    Network and multi-scale signal analysis for the integration of large omic datasets: applications in \u3ci\u3ePopulus trichocarpa\u3c/i\u3e

    Get PDF
    Poplar species are promising sources of cellulosic biomass for biofuels because of their fast growth rate, high cellulose content and moderate lignin content. There is an increasing movement on integrating multiple layers of ’omics data in a systems biology approach to understand gene-phenotype relationships and assist in plant breeding programs. This dissertation involves the use of network and signal processing techniques for the combined analysis of these various data types, for the goals of (1) increasing fundamental knowledge of P. trichocarpa and (2) facilitating the generation of hypotheses about target genes and phenotypes of interest. A data integration “Lines of Evidence” method is presented for the identification and prioritization of target genes involved in functions of interest. A new post-GWAS method, Pleiotropy Decomposition, is presented, which extracts pleiotropic relationships between genes and phenotypes from GWAS results, allowing for identification of genes with signatures favorable to genome editing. Continuous wavelet transform signal processing analysis is applied in the characterization of genome distributions of various features (including variant density, gene density, and methylation profiles) in order to identify chromosome structures such as the centromere. This resulted in the approximate centromere locations on all P. trichocarpa chromosomes, which had previously not been adequately reported in the scientific literature. Discrete wavelet transform signal processing followed by correlation analysis was applied to genomic features from various data types including transposable element density, methylation density, SNP density, gene density, centromere position and putative ancestral centromere position. Subsequent correlation analysis of the resulting wavelet coefficients identified scale-specific relationships between these genomic features, and provide insights into the evolution of the genome structure of P. trichocarpa. These methods have provided strategies to both increase fundamental knowledge about the P. trichocarpa system, as well as to identify new target genes related to biofuels targets. We intend that these approaches will ultimately be used in the designing of better plants for more efficient and sustainable production of bioenergy

    Diversity and Evolution of Short Interspersed Nuclear Elements (SINEs) in Angiosperm and Gymnosperm Species and their Application as molecular Markers for Genotyping

    Get PDF
    Short interspersed nuclear elements (SINEs) are small non-autonomous and heterogeneous retrotransposons, widespread in animals and plants and usually differentially propagated in related species resulting in genome-specific copy numbers. Within the monocots, the Poaceae (sweet grasses) is the largest and economically most important plant family. The distribution of 24 Poaceae SINE (PoaS) families, five of which showing a subfamily structure, was analyzed in five important cereals (Oryza sativa, Triticum aestivum, Hordeum vulgare, Sorghum bicolor, Zea mays), the energy crop Panicum virgatum and the model grass Brachypodium distachyon. The comparative investigation of SINE abundance and sequence diversity within Poaceae species provides insights into their species‐specific diversification and amplification. The PoaS families and subfamilies fall into two length and structural categories: simple SINEs of up to 180 bp and dimeric SINEs larger than 240 bp. Of 24 PoaS families, 20 are structurally related across species, in particular either in their 5′ or 3′ regions. Hence, reshuffling between SINEs, likely caused by nested insertions of full-lengh and truncated copies, is an important evolutionary mechanism of SINE formation. Most striking, the recently evolved homodimeric SINE family PoaS‐XIV occurs exclusively in wheat (T. aestivum) and consists of two tandemly arranged PoaS‐X.1 copies. Exemplary for deciduous tree species, the evolutionary history of SINE populations was examined in six Salicaceae genomes (Populus deltoides, Populus euphratica, Populus tremula, Populus tremuloides, Populus trichocarpa, Salix purpurea). Four of eleven Salicaceae SINE (SaliS) families exhibit a subfamily organization. The SaliS families consist of two groups, differing in their phylogenetic distribution pattern, sequence similarity and 3’ end structure. These groups probably emerged at different evolutionary periods of time: during the ‘salicoid duplication’ (~ 65 million years ago) in the Salix-Populus progenitor, and during the separation of the genus Salix (~ 45 - 65 million years ago), respectively. Similar to the PoaS families, the majority of the 20 SaliS families and subfamilies share regions of sequence similarity, providing evidence for SINE emergence by reshuffling. Furthermore, they also contain an evolutionarily young dimeric SINE family (SaliS-V), amplified only in two poplar genomes. The special feature of the Salicaceae SINEs is the contrast of the conservation of 5’ start motifs across species and SINE families compared to the high variability of 3’ ends within the SINE families, differing in sequence and length, presumably resulting from mutations in the poly(A) tail as a possible route for SINE elongation. Periods of increased transpositional activity promote the dissemination of novel 3’ ends. Thereby, evolutionarily older motifs are displaced leading to various 3’ end subpopulations within the SaliS families. Opposed to the PoaS families with a largely equal ratio of poly(A) to poly(T) tail SINEs, the SaliS families are exclusively terminated by adenine stretches. Among retrotransposon-based markers, SINEs are highly suitable for the development of molecular markers due to their unidirectional insertion and random distribution mainly in euchromatic genome regions, together with an easy and fast detection of the heterogeneous SINE families. As a prerequisite for the development of SINE-derived inter-SINE amplified polymorphism (ISAP) markers, 13 novel Theaceae SINE families (TheaS-I - TheaS-VII, TheaS-VIII.1 and TheaS-VIII.2, TheaS-IX - TheaS-XIII) were identified in the angiosperm tree species Camellia japonica. Moreover, six Pinaceae SINE families (PinS-I.1 and PinS-I.2, PinS-II – PinS-VI) were detected in the gymnosperm species Larix decidua. Compared to the SaliS and PoaS families, structural relationships are less frequent within the TheaS families and absent in the PinS families. The ISAP analysis revealed the genetic identity of Europe’s oldest historical camellia (C. japonica) trees indicating their vegetative propagation from the same ancestor specimen, which was probably the first living camellia on European ground introduced to England within the 18th century. Historical sources locate the native origin of this ancestral camellia specimen either in the Chinese province Yunnan or at the Japanese Gotō Islands. Comparative ISAPs showed no accordance to the Gotō camellia sample pool and appropriate Chinese reference samples were not available. However, the initial experiments demonstrated the potential of ISAP to resolve variations among natural populations. The ISAP application on angiosperm trees also concerned fast growing Populus clones grown in short rotation coppice plantations for energy production. The species-specific P. tremula ISAP primers might also be applied for the discrimination of hybrid poplar clones involving P. tremuloides genome portions, since SINEs of these two species are highly related. However, due to lineage-specific SINE evolution during speciation, cross-species applications are generally only successful to limited extent. The analysis of poplar hybrids composed of P. maximowiczii with either P. trichocarpa or P. nigra based on P. tremula ISAP primers showed a strongly reduced resolution. In forestry, hybrid larch (e.g. Larix × eurolepis) genotypes have to be selected from the offspring of Japanese (Larix kaempferi) and European larch (Larix decidua) crosses, as they exhibit superior growth rates compared to the parental species. Initial ISAP-based examinations of European larch genotypes provided less polymorphic banding patterns, probably resulting from general high levels of synteny and collinearities reported for gymnosperm species. Hence, the ISAP was combined with the AFLP technique to the novel marker system inter-SINE-restriction site amplified polymorphism (ISRAP). The amplicons originating from genomic regions between SINEs and EcoRI cleavage sites were visualized with the sensitive capillary gel electrophoresis. The ISRAP assays, based on EcoRI adapter primers combined with two different SINE-derived primers, resulted in a sufficient number of polymorphic peaks to distinguish the L. decidua genotypes investigated. Compared to ISAPs, the ISRAP approach provides the required resolution to differentiate highly similar larch genotypes

    Study of the role of plant nuclear envelope and lamina-like components in nuclear and chromatin organisation using 3D imaging

    Get PDF
    The linker of nucleoskeleton and cytoskeleton (LINC) complex is an evolutionarily well-conserved protein bridge connecting the cytoplasmic and nuclear compartments across the nuclear membrane. While recent data supports its function in nuclear morphology and meiosis, its implication for chromatin organisation has been less studied in plants. The fi aim of this work was to develop NucleusJ a simple and user-friendly ImageJ plugin dedicated to the characterisation of nuclear morphol- ogy and chromatin organisation in 3D. NucleusJ quantifies 15 parameters including shape and size of nuclei as well as intra-nuclear objects and their position within the nucleus. A step-by-step documentation is available for self-training, together with data sets of nuclei with diff t nuclear organisation. Several improvements are ongoing to release a new version of this plugin. In a second part of this work, 3D imaging methods have been used to investigate nuclear morphology and chromatin organisation in interphase nuclei of the plant model Arabidopsis thaliana in which heterochromatin domains cluster in conspicuous chromatin regions called chromo- centres. Chromocentres form a repressive chromatin environment contributing to the transcriptional silencing of repeated sequences a general mechanism needed for genome stability. Quantitative measurements of 3D position of chromocentres in the nucleus indicate that most chromocentres are situated in close proximity to the periphery of the nucleus but that this distance can be altered according to nuclear volume or in specific mutants affecting the LINC complex. Finally, the LINC com- plex is proposed to contribute at the proper chromatin organisation and positioning since its alteration is associated with the release of transcriptional silencing as well as decompaction of heterochromatic sequences. The last part of this work takes ad- vantage of available genomic sequences and RNA-seq data to explore the evolution of NE proteins in plants and propose a minimal requirement to built the simplest functional NE. Altogether, work achieved in this thesis associate genetics, molecular biology, bioinformatics and imaging to better understand the contribution of the nuclear envelope in nuclear morphology and chromatin organisation and suggests the functional implication of the LINC complex in these processes

    Assessing the impact of alternative splicing on the diversity and evolution of the proteome in plants

    Get PDF
    Splicing is one of the key processing steps during the maturation of a gene’s primary transcript into the mRNA molecule used as a template for protein production. Splicing involves the removal of segments called introns and re-joining of the remaining segments called exons. It is by now well established that not always the same segments are removed from a gene’s primary transcript during the splicing process. The consequence of this splicing variation, termed Alternative Splicing (AS), is that multiple distinct mature mRNA molecules can be produced from a single gene. One of the two biological roles that are ascribed to AS is that of a mechanism which enables an organism to produce multiple functionally distinct proteins from a single gene. Alternatively, AS can serve as a means for controlling gene expression at the post-transcriptional level. Although many clear examples have been reported for both roles, the extent to which AS increases the functional diversity of the proteome, regulates gene expression or simply reflects noise in splicing machinery is not well known. Determining the full functional impact of AS by designing and performing wet-lab experiments for all AS events is unfeasible and bioinformatics approaches have therefore widely been used for studying the impact of AS at a genome-wide scale. In this thesis four bioinformatics studies are presented that were aimed at determining the extent to which AS is used in plants as a mechanism for producing multiple distinct functional proteins from a single gene. Each chapter uses a different method for analyzing specific properties of AS. Under the premise that functional genetic features are more likely to be conserved than non-functional ones, AS events that are present in two or more species are more likely to be biologically relevant than those that are confined to a single species. In chapter 2 we analyzed the conservation of AS by performing a comparative analysis between three divergent plant species. The results of that study indicated that the vast majority of AS events does not persist over long periods of evolution. We concluded, based on this lack of conservation, that AS only has a limited impact on the functional diversity of the proteome in plants. Following this conclusion, it can hypothesized that the variation that AS induces at the transcriptome level is not likely to be manifested at the protein level. In chapter 3 we tested this hypothesis by analyzing two independent proteomics datasets. This type of data can be used to directly identify proteins present in a biological sample. Our results indicated that the variation induced by AS at the transcriptome level is also manifested at the protein level. We concluded that either many AS events have a confined species-specific (not conserved) function or simply produce protein variants that are stable enough to escape rapid turn-over. Another method for determining whether AS increases the functional diversity of the proteome is by determining whether protein sequence variations that are typically induced by AS are common within the plant kingdom. We found (chapter 4) that this is not the case in plants and concluded that novel functions do not frequently arise through AS. We also found that most of the AS-induced variation is lost, similarly as for redundant gene copies, within a very short evolutionary time period. One limitation of genome-wide analyses is that these capture only the more general patterns. However, the functional impact of AS can be very different in different genes or gene-families. In order fully assess the functional impact of AS, it is therefore important to also study the process within the functional context of individual genes or gene families. In chapter 5 we demonstrated this concept by performing a detailed analysis of AS within the MADS-box gene family. We were able to provide clues as to how AS might impact the protein-protein interaction capabilities of individual MADS proteins. Some of our predictions were supported by experimental evidence. We further showed how AS can serve as an evolutionary mechanism for experimenting with novel functions (novel interactions) without the explicit loss of existing functions. The overall conclusion, based on the performed analyses is as follows: AS primarily is a consequence of noise in the splicing machinery and results in an increased diversity of the proteome. However, only a small fraction of the proteins resulting from AS will have beneficial functions and are subsequently selected for during evolution. The large remaining fraction is, similarly as for redundant gene-copies, lost within a very short evolutionary time period after its emergence. </p

    Genetics, Genomics and Biotechnology of Plant Cytoplasmic Organelles

    Get PDF
    The papers included in this Special Issue address a variety of important aspects of Genetics, Genomics and Biotechnology of Plant Cytoplasmic Organelles, including new advances in the sequencing of both mitochondria and chloroplasts’ genomes using Next-Generation Sequencing technology in plant species and algae including important crop and tree species, in vitro culture protocol, and identification of a core module of genes involved in plastid development. In particular, the published studies focus on the description of adaptive evolution, elucidate mitochondrial mRNA processing, highlight the effect of domestication process on plastome variability and report the development of molecular markers. A meta-analysis of recently published genome-wide expression studies allowed the identification of novel nuclear genes, involved in the complex and still unrevealed mechanisms at the basis of communication between chloroplast and nucleus (retrograde signalling) during plastid development (biogenic control). Finally, an optimized regeneration protocol useful in plastid transformation of recalcitrant species, such as sugarcane, has been reported

    Annotation and comparative analysis of fungal genomes: a hitchhiker's guide to genomics

    Get PDF
    This thesis describes several genome-sequencing projects such as those from the fungi Laccaria bicolor S238N-H82, Glomus intraradices DAOM 197198, Melampsora laricis-populina 98AG31, Puccinia graminis, Pichia pastoris GS115 and Candida bombicola, as well as the one of the haptophyte Emiliania huxleyi CCMP1516. These species are important organisms in many aspects, for instance: L. bicolor and G. intraradices are symbiotic fungi growing associate with trees and present an important ecological niches for promoting tree growth; M. laricis- populina and P. graminis are two devastating fungi threating plants; the tiny yeast P. pastoris is the major protein production platform in the pharmaceutical industry; the biosurfactant production yeast C. bombicola is likely to provide a low ecotoxicity detergent and E. huxleyi places in a unique phylogeny position of chromalveolate and contributes to the global carbon cycle system. The completion of the genome sequence and the subsequent functional studies broaden our understanding of these complex biological systems and promote the species as possible model organisms. However, it is commonly observed that the genome sequencing projects are launched with lots of enthusiasm but often frustratingly difficult to finish. Part of the reason are the ever-increasing expectations regarding quality delivery (both with respect to data and analyses). The Introductory Chapter aims to provide an overview of how best to conduct a genome sequencing project. It explains the importance of understanding the basic biology and genetics of the target organism. It also discusses the latest developments in new (next) generation high throughput sequencing (HTS) technologies, how to handle the data and their applications. The emergence of the new HTS technologies brings the whole biology research into a new frontier. For instance, with the help of the new sequencing technologies, we were able to sequence the genome of our interest, namely Pichia pastoris. This tiny yeast, the analysis of which forms the bulk of this thesis, is an important heterologous production platform because its methanol assimilation properties makes it ideally suitable for large scale industrial production. The unique protein assembly pathway of P. pastoris also attracts much basic research interests. We used the new HTS method to sequence and assemble the GS115 genome into four chromosomes and made it publicly available to the research community (Chapter 2 and Chapter 3). The public release of the GS115 brought broader interests on the comparison of GS115 and its parental strains. By sequencing the parental strain of GS115 with different new sequencing platforms, we identified several point mutations in the coding genes that likely contribute to the higher protein translocation efficiency in GS115. The sequence divergence and copy number variation of rDNA between strains also explains the difference of protein production efficiency (Chapter 4). Before 2008, the Sanger sequencing method was the only technology to obtain high quality complete genomes of eukaryotes. Because of the high cost of the Sanger method, regarding the other genome projects discussed in this thesis, it was necessary to team up with many other partners and to rely on the U.S. Department of Energy Joint Genome Institute (DOE-JGI) and the Broad Institute to generate the genome sequence. The M. larici-populina srain 98AG31 and the Puccinia graminis f. sp. tritici strain CRL 75-36-700-3 are two devastating basidiomycete ‘rusts’ that infect poplar and wheat. Lineage-specific gene family expansions in these two rusts highlight the possible role in their obligate biotrophic life-style. Two large sets of effector-like small-secreted proteins with different pri- mary sequence structures were identified in each organism. The in planta-induced transcriptomic data showed upregulation of these lineage-specific genes and they are likely involved in the establishing of the rust-host interaction. An additional immunolocalization study on M. larici-populina confirmed the accumulation of some candidate effectors in the haustoria and infection hyphae, which is described in Chapter 5

    Reciprocal Informants: Using Fungal Bioinformatics, Genomics, and Ecology to tie Mechanisms to Ecosystems

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2019. Major: Plant and Microbial Biology. Advisors: Peter Kennedy, H Kistler. 1 computer file (PDF); viii, iv, 126 pages.Across both wild and human-structured ecosystems, fungi interact with every plant species on earth. From mycorrhizal mutualisms, harmless endophytes, and deadly pathogens, the results of these interactions can mean the difference between a plant’s ability to grow and flourish, or languish and expire. Fungal-host dynamics are not static traits, either over evolutionarily time or during the lifetime of individuals where ecological context dependency shapes the outcomes of fungal-host interactions. Understanding the ecological and genetic factors that structure plant-fungal relationships has wide ranging consequences for ecosystems, agro-ecosystems, and human health. However, it’s not well understood how complex genetic mechanisms and ecological pressures work in concert to structure the outcomes of fungal-host interactions, particularly among fungal mutualists. This dissertation contributes to this understanding by investigating how fungal-host relationships are regulated at two levels: broadly, investigating the ecology of fungal-host systems, and specifically, investigating the genetic and genomic basis of how these interactions are mediated. I begin Chapter 1 from the perspective of fungal ecology, investigating the influence of neighborhood (the surrounding plant community) on host specificity patterns using the host-specialist ectomycorrhizal (ECM) genus Suillus. The number of host species that a given fungal species will associate with, and how closely related these host species are, is the study of fungal host specificity. While some fungi associate with only a single species of host (high host specificity), most associate with tens or hundreds of host species (low host specificity). Fungi in the genus Suillus are famous for their high host specificity, primarily associating with plants in the family Pineaceae (particularly White Pines, Red Pines and Larchs). Using a combination of field sampling, sequencing, and colonization bioassays, I present evidence that one species, S. subaureus, has undergone a novel host-expansion onto Angiosperms, and argue that neighborhood effects influence ECM colonization outcomes over both space and time. In Chapter 2, I expand from fungal ecology into fungal genomes. Using genome mining and comparative genomics, I look for signatures of ECM host specificity using 19 genome sequenced Suillus species in relation to 1) other (non-Suillus) ECM fungi and 2) an intrageneric comparison between Suillus that specialize on Red Pine, White Pine or Larch. I present evidence for the involvement of several molecular classes in regulating Suillus host specificity including species specific small secreted proteins, G-protein coupled receptors, and terpene secondary metabolites. Finally, in Chapter 3, I use the genomic and bioinformatic tool sets developed in Chapters 1 and 2, to expand my analysis across the fungal phylogeny and ask questions about a potential molecular correlate of fungal guild and trophic mode: ribosomal DNA (rDNA) copy number. To do this, I developed a bioinformatic pipeline to estimate rDNA copy number variation from whole genome sequence data, and applied it to a phylogenetically and ecologically diverse set of 91 fungal genomes. I present evidence that rDNA copy number is inversely associated phylogenetic distance, but displays a high level of variation, spanning an order of magnitude in Suillus alone, with no detectable correlation to guild occupation or genome size. Taken together, the work presented here shows that genomic and bioinformatic approaches used in concert with classical ecological methodologies, offer great potential to expand our understanding of the two-way influence of ecosystem-level processes and gene-level mechanisms in structuring plant-fungal interactions
    corecore