10,262 research outputs found

    Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants

    Get PDF
    For most sequenced flowering plants, multiple whole-genome duplications (WGDs) are found. Duplicated genes following WGD often have different fates that can quickly disappear again, be retained for long(er) periods, or subsequently undergo small-scale duplications. However, how different expression, epigenetic regulation, and functional constraints are associated with these different gene fates following a WGD still requires further investigation due to successive WGDs in angiosperms complicating the gene trajectories. In this study, we investigate lotus (Nelumbo nucifera), an angiosperm with a single WGD during the K–pg boundary. Based on improved intraspecific-synteny identification by a chromosome-level assembly, transcriptome, and bisulfite sequencing, we explore not only the fundamental distinctions in genomic features, expression, and methylation patterns of genes with different fates after a WGD but also the factors that shape post-WGD expression divergence and expression bias between duplicates. We found that after a WGD genes that returned to single copies show the highest levels and breadth of expression, gene body methylation, and intron numbers, whereas the long-retained duplicates exhibit the highest degrees of protein–protein interactions and protein lengths and the lowest methylation in gene flanking regions. For those long-retained duplicate pairs, the degree of expression divergence correlates with their sequence divergence, degree in protein–protein interactions, and expression level, whereas their biases in expression level reflecting subgenome dominance are associated with the bias of subgenome fractionation. Overall, our study on the paleopolyploid nature of lotus highlights the impact of different functional constraints on gene fate and duplicate divergence following a single WGD in plant

    Anopheline salivary protein genes and gene families: an evolutionary overview after the whole genome sequence of sixteen Anopheles species

    Get PDF
    Background: Mosquito saliva is a complex cocktail whose pharmacological properties play an essential role in blood feeding by counteracting host physiological response to tissue injury. Moreover, vector borne pathogens are transmitted to vertebrates and exposed to their immune system in the context of mosquito saliva which, in virtue of its immunomodulatory properties, can modify the local environment at the feeding site and eventually affect pathogen transmission. In addition, the host antibody response to salivary proteins may be used to assess human exposure to mosquito vectors. Even though the role of quite a few mosquito salivary proteins has been clarified in the last decade, we still completely ignore the physiological role of many of them as well as the extent of their involvement in the complex interactions taking place between the mosquito vectors, the pathogens they transmit and the vertebrate host. The recent release of the genomes of 16 Anopheles species offered the opportunity to get insights into function and evolution of salivary protein families in anopheline mosquitoes. Results: Orthologues of fifty three Anopheles gambiae salivary proteins were retrieved and annotated from 18 additional anopheline species belonging to the three subgenera Cellia, Anopheles, and Nyssorhynchus. Our analysis included 824 full-length salivary proteins from 24 different families and allowed the identification of 79 novel salivary genes and re-annotation of 379 wrong predictions. The comparative, structural and phylogenetic analyses yielded an unprecedented view of the anopheline salivary repertoires and of their evolution over 100 million years of anopheline radiation shedding light on mechanisms and evolutionary forces that contributed shaping the anopheline sialomes. Conclusions: We provide here a comprehensive description, classification and evolutionary overview of the main anopheline salivary protein families and identify two novel candidate markers of human exposure to malaria vectors worldwide. This anopheline sialome catalogue, which is easily accessible as hyperlinked spreadsheet, is expected to be useful to the vector biology community and to improve the capacity to gain a deeper understanding of mosquito salivary proteins facilitating their possible exploitation for epidemiological and/or pathogen-vector-host interaction studies

    Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs.

    Get PDF
    The ortholog conjecture implies that functional similarity between orthologous genes is higher than between paralogs. It has been supported using levels of expression and Gene Ontology term analysis, although the evidence was rather weak and there were also conflicting reports. In this study on 12 species we provide strong evidence of high conservation in tissue-specificity between orthologs, in contrast to low conservation between within-species paralogs. This allows us to shed a new light on the evolution of gene expression patterns. While there have been several studies of the correlation of expression between species, little is known about the evolution of tissue-specificity itself. Ortholog tissue-specificity is strongly conserved between all tetrapod species, with the lowest Pearson correlation between mouse and frog at r = 0.66. Tissue-specificity correlation decreases strongly with divergence time. Paralogs in human show much lower conservation, even for recent Primate-specific paralogs. When both paralogs from ancient whole genome duplication tissue-specific paralogs are tissue-specific, it is often to different tissues, while other tissue-specific paralogs are mostly specific to the same tissue. The same patterns are observed using human or mouse as focal species, and are robust to choices of datasets and of thresholds. Our results support the following model of evolution: in the absence of duplication, tissue-specificity evolves slowly, and tissue-specific genes do not change their main tissue of expression; after small-scale duplication the less expressed paralog loses the ancestral specificity, leading to an immediate difference between paralogs; over time, both paralogs become more broadly expressed, but remain poorly correlated. Finally, there is a small number of paralog pairs which stay tissue-specific with the same main tissue of expression, for at least 300 million years

    Draft genomes of two Artocarpus plants, jackfruit (A. heterophyllus) and breadfruit (A. altilis)

    Get PDF
    Two of the most economically important plants in the Artocarpus genus are jackfruit (A. heterophyllus Lam.) and breadfruit (A. altilis (Parkinson) Fosberg). Both species are long-lived trees that have been cultivated for thousands of years in their native regions. Today they are grown throughout tropical to subtropical areas as an important source of starch and other valuable nutrients. There are hundreds of breadfruit varieties that are native to Oceania, of which the most commonly distributed types are seedless triploids. Jackfruit is likely native to the Western Ghats of India and produces one of the largest tree-borne fruit structures (reaching up to 45 kg). To-date, there is limited genomic information for these two economically important species. Here, we generated 273 Gb and 227 Gb of raw data from jackfruit and breadfruit, respectively. The high-quality reads from jackfruit were assembled into 162,440 scaffolds totaling 982 Mb with 35,858 genes. Similarly, the breadfruit reads were assembled into 180,971 scaffolds totaling 833 Mb with 34,010 genes. A total of 2822 and 2034 expanded gene families were found in jackfruit and breadfruit, respectively, enriched in pathways including starch and sucrose metabolism, photosynthesis, and others. The copy number of several starch synthesis-related genes were found to be increased in jackfruit and breadfruit compared to closely-related species, and the tissue-specific expression might imply their sugar-rich and starch-rich characteristics. Overall, the publication of high-quality genomes for jackfruit and breadfruit provides information about their specific composition and the underlying genes involved in sugar and starch metabolism

    Predicting the Impact of Alternative Splicing on Plant MADS Domain Protein Function

    Get PDF
    Several genome-wide studies demonstrated that alternative splicing (AS) significantly increases the transcriptome complexity in plants. However, the impact of AS on the functional diversity of proteins is difficult to assess using genome-wide approaches. The availability of detailed sequence annotations for specific genes and gene families allows for a more detailed assessment of the potential effect of AS on their function. One example is the plant MADS-domain transcription factor family, members of which interact to form protein complexes that function in transcription regulation. Here, we perform an in silico analysis of the potential impact of AS on the protein-protein interaction capabilities of MIKC-type MADS-domain proteins. We first confirmed the expression of transcript isoforms resulting from predicted AS events. Expressed transcript isoforms were considered functional if they were likely to be translated and if their corresponding AS events either had an effect on predicted dimerisation motifs or occurred in regions known to be involved in multimeric complex formation, or otherwise, if their effect was conserved in different species. Nine out of twelve MIKC MADS-box genes predicted to produce multiple protein isoforms harbored putative functional AS events according to those criteria. AS events with conserved effects were only found at the borders of or within the K-box domain. We illustrate how AS can contribute to the evolution of interaction networks through an example of selective inclusion of a recently evolved interaction motif in the MADS AFFECTING FLOWERING1-3 (MAF1–3) subclade. Furthermore, we demonstrate the potential effect of an AS event in SHORT VEGETATIVE PHASE (SVP), resulting in the deletion of a short sequence stretch including a predicted interaction motif, by overexpression of the fully spliced and the alternatively spliced SVP transcripts. For most of the AS events we were able to formulate hypotheses about the potential impact on the interaction capabilities of the encoded MIKC protein

    The low recombining pericentromeric region of barley restricts gene diversity and evolution but not gene expression

    Get PDF
    The low-recombining pericentromeric region of the barley genome contains roughly a quarter of the genes of the species, embedded in low-recombining DNA that is rich in repeats and repressive chromatin signatures. We have investigated the effects of pericentromeric region residency upon the expression, diversity and evolution of these genes. We observe no significant difference in average transcript level or developmental RNA specificity between the barley pericentromeric region and the rest of the genome. In contrast, all of the evolutionary parameters studied here show evidence of compromised gene evolution in this region. First, genes within the pericentromeric region of wild barley show reduced diversity and significantly weakened purifying selection compared with the rest of the genome. Second, gene duplicates (ohnolog pairs) derived from the cereal whole-genome duplication event ca. 60MYa have been completely eliminated from the barley pericentromeric region. Third, local gene duplication in the pericentromeric region is reduced by 29% relative to the rest of the genome. Thus, the pericentromeric region of barley is a permissive environment for gene expression but has restricted gene evolution in a sizeable fraction of barley's genes

    The amphioxus genome and the evolution of the chordate karyotype

    Get PDF
    Lancelets ('amphioxus') are the modern survivors of an ancient chordate lineage, with a fossil record dating back to the Cambrian period. Here we describe the structure and gene content of the highly polymorphic approx520-megabase genome of the Florida lancelet Branchiostoma floridae, and analyse it in the context of chordate evolution. Whole-genome comparisons illuminate the murky relationships among the three chordate groups (tunicates, lancelets and vertebrates), and allow not only reconstruction of the gene complement of the last common chordate ancestor but also partial reconstruction of its genomic organization, as well as a description of two genome-wide duplications and subsequent reorganizations in the vertebrate lineage. These genome-scale events shaped the vertebrate genome and provided additional genetic variation for exploitation during vertebrate evolution

    The Carcinoembryonic Antigen Gene Family

    Get PDF
    The molecular cloning of carcinoembryonic antigen (CEA) and several cross-reacting antigens reveals a basic domain structure for the whole family, which shows structural similarities to the immunoglobulin superfamily. The CEA family consists of approximately 10 genes which are localized in two clusters on chromosome 19. So far, mRNA species for five of these genes have been identified which show tissue variability in their transcriptional activity. Expression of some of these genes in heterologous systems has been achieved, allowing the localization of some epitopes. The characterization of a CEA gene family in the rat and a comparison with its human counterpart has been utilized in the development of an evolutionary model

    Evolutionary analyses of orphan genes in mouse lineages in the context of de novo gene birth

    No full text
    Gene birth is the process through which new genes appear. For a long time it was argued that the natural way of generating new genes was from copies of existing genes, and the possibility of de novo gene emergence was neglected. However, recent evidence has forced to reconsider old models and de novo gene birth gained recognition as a widespread phenomenon. De novo gene birth is the process by which a non-genic sequence is able to gain gene-like features through few mutations. The following work is a compilation of analyses that seek to highlight the importance and prevalence of de novo gene birth in genomes, suggesting that this is a process that is present at all times and which becomes very relevant upon ecological shifts. In the first chapter, I showed through phylostratigraphic analyses that new genes are substantially simpler than older, a trend which was consistent for several features and organisms, and suggestive of a frequent emergence of new genes through non-duplicative processes. In addition to this, I detected a strong association between gene birth and high transcriptional activity and chromosomal proximity. As part of this work, I was also able to use phylostratigraphy to evaluate a different model of gene birth, overprinting of alternative reading frames. In the following chapters of this dissertation, I made use of high-throughput sequencing of transcriptomes and genomes to ask questions about the origin and change of genes at closer time divergences than ever before, ranging from nearly 3000 years to 10 million years of divergence. I was able to detect the theoretically predicted effects of short time scale comparisons on the rate of protein evolution. Also, I contribute evidence that genes of different ages show different selective constraints even after only a few thousand years of divergence. Finally, in the last part of this thesis I evaluated the role of transcription in gene birth dynamics. Transcription seems to be a predominant feature of genomes, as most of the genome showed some level of transcription. In terms of de novo gene birth, I was able to identify 663 candidate loci from presence and absence of transcription. Analyses of these candidate loci indicated that gains are rather stable, meaning that subsequent losses were rarely found. In agreement with previous studies, I confirmed the role of testis as a driver of new genes. These results indicate that transcription is not a limiting factor in the emergence of new genes, and that our knowledge about the key regulatory elements of transcription and their turnover is still limited to explain why new genes seem to arise at a higher rate than they decay.Contents ......................................................................................................................................... 3 Summary of the thesis .................................................................................................................... 6 Zusammenfassung der Dissertation............................................................................................... 7 Acknowledgements ....................................................................................................................... 10 General introduction..................................................................................................................... 12 A brief historic perspective on the concepts of gene birth .................................................... 12 Gene duplication is the main source of new genes .............................................................. 12 Orphan genes and the genomics era .................................................................................... 14 Phylostratigraphy and the continuous emergence of new genes ......................................... 16 Not all genes come from other genes ................................................................................... 17 Considering gene birth from molecular and evolutionary perspectives ................................... 19 Overprinting: true innovation from existing genes .................................................................... 20 The life cycle of genes .............................................................................................................. 22 Overview................................................................................................................................... 24 Chapter 1: Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution ............................................................................................................................... 26 Introduction............................................................................................................................... 26 Results...................................................................................................................................... 27 Phylostratigraphy of mouse genes ........................................................................................ 27 Genomic features across ages.............................................................................................. 29 Chromosomal distribution ...................................................................................................... 33 Association with transcriptionally active sites ....................................................................... 33 Testis expressed genes......................................................................................................... 35 Alternative reading frames..................................................................................................... 36 Discussion ................................................................................................................................ 39 De novo evolution versus duplication-divergence ................................................................ 40 Regulatory evolution .............................................................................................................. 40 Overprinting ........................................................................................................................... 41 Conclusion................................................................................................................................ 42 Methods .................................................................................................................................... 43 Phylostratigraphy ................................................................................................................... 43 Gene structure analyses........................................................................................................ 43 Transcription associated regions........................................................................................... 44 Expression data for testis ...................................................................................................... 44 Secondary reading frames .................................................................................................... 44 Acknowledgements ................................................................................................................... 45 Chapter 2: Sequencing of genomes and transcriptomes of closely related mouse species....... 46 Introduction............................................................................................................................... 46 Using wild mice to understand gene birth at the transcriptome level ................................... 46 Phylogeographic distribution of the samples ........................................................................ 47 Methods .................................................................................................................................... 49 Biological material.................................................................................................................. 49 Transcriptome sequencing .................................................................................................... 49 Genome sequencing.............................................................................................................. 49 Raw data processing ............................................................................................................. 50 Transcriptome read mapping, annotation and quantification................................................ 50 Genome read mapping .......................................................................................................... 51 Available resources ................................................................................................................... 51 Chapter 3: Differential selective constrains across phylogenetic ages and their impact on the turnover of protein-coding genes. ................................................................................................. 53 Introduction............................................................................................................................... 53 Methods .................................................................................................................................... 53 Transcriptome assembly ....................................................................................................... 53 Generation of ortholog pairs and rate analyses .................................................................... 54 Overlapping genes................................................................................................................. 54 Reading frame polymorphism detection and annotation ...................................................... 55 Statistical analyses ................................................................................................................ 55 Results...................................................................................................................................... 55 Rate differences between genes of different ages ............................................................... 55 Overlapping genes are an unlikely source of bias ................................................................ 57 Impact of reading frame polymorphisms across phylogenetic time...................................... 59 Discussion ................................................................................................................................ 64 Acknowledgements ................................................................................................................... 66 Chapter 4: A transcriptomics approach to the gain and loss of de novo genes in mouse lineages...................................................................................................................................................... 67 Introduction............................................................................................................................... 67 How is a gene made? ............................................................................................................ 67 The early phase of new gene emergence............................................................................. 69 Pervasive transcription and junk-DNA as raw material for new genes ................................ 70 Methods .................................................................................................................................... 71 Transcriptome presence/absence matrix and mapping of gains and losses ....................... 71 Results...................................................................................................................................... 73 How much of the mouse genome has evidence of transcription? ........................................ 73 Genome-wide transcription: gain and loss dynamics ........................................................... 74 Phylogenetic patterns in genome-wide transcription ............................................................ 75 How much of the genome is transcribed in a lineage specific way? .................................... 77 Identification of cases of de novo transcripts ........................................................................ 81 Quantification of gain rates for curated genes ...................................................................... 84 What are the dynamics of transcription loss in known genes?............................................. 86 Where are new genes expressed?........................................................................................ 88 Discussion ................................................................................................................................ 89 Pervasive transcription can provide material for new genes ................................................ 89 Asymmetry in gains and losses of transcription.................................................................... 92 From transcribed protogenes to de novo genes ................................................................... 93 Differences in expression levels ............................................................................................ 95 Testis as a niche for new genes ............................................................................................ 95 Conclusion................................................................................................................................ 96 Concluding remarks ...................................................................................................................... 97 Perspectives................................................................................................................................. 98 References ................................................................................................................................... 99 Chapter contributions .................................................................................................................. 114 Appendices ................................................................................................................................ 115 Appendix A. Phylostratigraphic maps ..................................................................................... 115 Appendix B. Curation data from orphan genes ...................................................................... 115 Appendix C. Functional annotation clusters based on known genes with loss of expression ................................................................................................................................................ 117 Appendix D. Transcriptome information and statistics ........................................................... 118 Curriculum Vitae.......................................................................................................................... 119 Affidavit....................................................................................................................................... 12
    corecore