33 research outputs found

    Identification and characterization of pseudogenes in the rice gene complement

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Osa1 Genome Annotation of rice (<it>Oryza sativa </it>L. ssp. <it>japonica </it>cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog.</p> <p>Results</p> <p>A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.</p> <p>Conclusion</p> <p>These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.</p

    Investigating the Role of Gene Duplication in Ribosomal Protein Evolution and Testing a Model of Duplicate Gene Retention in Mammals

    Get PDF
    Since Susumu Ohno’s seminal work in 1970, gene duplication has been widely recognized as the origin of multi-gene families and a major mechanism of evolutionary change. Understanding forces that govern the evolution of gene families through retention or loss of duplicated genes has been the subject of much inquiry and debate. The key challenge in this debate is accounting for retention of duplicate genes when, in the absence of some countervailing selective pressure leading to their retention, population genetics predicts that the overwhelming majority of duplicated genes should be lost. In an attempt to investigate the generation and retention of duplicate genes in mammals, the Nelson lab undertook annotation of duplication events in five mammalian genomes. We classified each event by duplication mechanism and duplicate gene fate. This led to two important and unexpected findings: First, half of all conserved duplicates are generated by RNA-based duplication (Retroduplication) events; second, ribosomal protein genes constitute one of the largest classes of conserved duplicated genes in mammals with majority of these duplicates being RNA-based. The work in this thesis begins with identifying and characterizing all gene duplicates of mammalian ribosomal protein gene (RPG) families. We found an unexpected large amount of intact retroduplicates (RTs) which cannot be readily explained by Ohno’s classic gene duplication trajectories. Hence, we propose a novel gene duplication model, Duplication Purification and Inactivation (DPI) that would be able to account for this phenomenon and ultimately serve in conjunction with other established models. Specifically, we hypothesize that dominant negative phenotypes prevent fixation of missense mutations in duplicated genes, thereby extending the survival of intact copies in the genome. Together, this thesis work provides a comprehensive history of ribosomal protein evolution in mammals, comprises a body of evidence that meets or exceeds that available for any other model of duplicate retention, and establishes the impact of forces that could influence the fate of every gene duplication event. Thus, the work described here has the potential to provide one of the most rigorously tested and widely applicable models of duplicate gene retention since Ohno first articulated the problem in the 1970’s

    Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants

    Get PDF
    For most sequenced flowering plants, multiple whole-genome duplications (WGDs) are found. Duplicated genes following WGD often have different fates that can quickly disappear again, be retained for long(er) periods, or subsequently undergo small-scale duplications. However, how different expression, epigenetic regulation, and functional constraints are associated with these different gene fates following a WGD still requires further investigation due to successive WGDs in angiosperms complicating the gene trajectories. In this study, we investigate lotus (Nelumbo nucifera), an angiosperm with a single WGD during the K–pg boundary. Based on improved intraspecific-synteny identification by a chromosome-level assembly, transcriptome, and bisulfite sequencing, we explore not only the fundamental distinctions in genomic features, expression, and methylation patterns of genes with different fates after a WGD but also the factors that shape post-WGD expression divergence and expression bias between duplicates. We found that after a WGD genes that returned to single copies show the highest levels and breadth of expression, gene body methylation, and intron numbers, whereas the long-retained duplicates exhibit the highest degrees of protein–protein interactions and protein lengths and the lowest methylation in gene flanking regions. For those long-retained duplicate pairs, the degree of expression divergence correlates with their sequence divergence, degree in protein–protein interactions, and expression level, whereas their biases in expression level reflecting subgenome dominance are associated with the bias of subgenome fractionation. Overall, our study on the paleopolyploid nature of lotus highlights the impact of different functional constraints on gene fate and duplicate divergence following a single WGD in plant

    Landscape of gene transposition-duplication within the Brassicaceae family

    Get PDF
    © The Author(s) 2018. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. We developed the CLfinder-OrthNet pipeline that detects co-linearity among multiple closely related genomes, finds orthologous gene groups, and encodes the evolutionary history of each orthologue group into a representative network (OrthNet). Using a search based on network topology, we identified 1,394 OrthNets that included gene transposition-duplication (tr-d) events, out of 17,432 identified in six Brassicaceae genomes. Occurrences of tr-d shared by subsets of Brassicaceae genomes mirrored the divergence times between the genomes and their repeat contents. The majority of tr-d events resulted in truncated open reading frames (ORFs) in the duplicated loci. However, the duplicates with complete ORFs were significantly more frequent than expected from random events. These were derived from older tr-d events and had a higher chance of being expressed. We also found an enrichment of tr-d events with complete loss of intergenic sequence conservation between the original and duplicated loci. Finally, we identified tr-d events uniquely found in two extremophytes among the six Brassicaceae genomes, including tr-d of SALT TOLERANCE 32 and ZINC TRANSPORTER 3 that relate to their adaptive evolution. CLfinder-OrthNet provides a flexible toolkit to compare gene order, visualize evolutionary paths among orthologues as networks, and identify gene loci that share an evolutionary history

    The use of microarray technology to analyse the evolution and functional divergence of duplicated genes

    Get PDF

    Assessing the genomic evidence for conserved transcribed pseudogenes under selection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Transcribed pseudogenes </it>are copies of protein-coding genes that have accumulated indicators of coding-sequence decay (such as frameshifts and premature stop codons), but nonetheless remain transcribed. Recent experimental evidence indicates that transcribed pseudogenes may regulate the expression of homologous genes, through antisense interference, or generation of small interfering RNAs (siRNAs). Here, we assessed the genomic evidence for such transcribed pseudogenes of potential functional importance, in the human genome. The most obvious indicators of such functional importance are significant evidence of conservation and selection pressure.</p> <p>Results</p> <p>A variety of pseudogene annotations from multiple sources were pooled and filtered to obtain a subset of sequences that have significant mid-sequence disablements (frameshifts and premature stop codons), and that have clear evidence of full-length mRNA transcription. We found 1750 such transcribed pseudogene annotations (TPAs) in the human genome (corresponding to ~11.5% of human pseudogene annotations). We checked for syntenic conservation of TPAs in other mammals (rhesus monkey, mouse, rat, dog and cow). About half of the human TPAs are conserved in rhesus monkey, but strikingly, very few in mouse (~3%). The TPAs conserved in rhesus monkey show evidence of selection pressure (relative to surrounding intergenic DNA) on: <it>(i) </it>their GC content, and <it>(ii) </it>their rate of nucleotide substitution. This is in spite of distributions of Ka/Ks (ratios of non-synonymous to synonymous substitution rates), congruent with a lack of protein-coding ability. Furthermore, we have identified 68 human TPAs that are syntenically conserved in at least two other mammals. Interestingly, we observe three TPA sequences conserved in dog that have intermediate character (<it>i.e.</it>, evidence of both protein-coding ability and pseudogenicity), and discuss the implications of this.</p> <p>Conclusion</p> <p>Through evolutionary analysis, we have identified candidate sequences for functional human transcribed pseudogenes, and have pinpointed 68 strong candidates for further investigation as potentially functional transcribed pseudogenes across multiple mammal species.</p

    Chromatin accessibility dynamics in the Arabidopsis root epidermis and endodermis during cold acclimation

    Get PDF
    Understanding cell-type specific transcriptional responses to environmental conditions is limited by a lack of knowledge of transcriptional control due to epigenetic dynamics. Additionally, cell-type analyses are limited by difficulties in applying current technologies to single cell-types. A novel DNase-seq protocol and analysis procedure, deemed DNase-DTS, was developed to identify DHSs in the Arabidopsis epidermis and endodermis under control and cold acclimation conditions. Results identified thousands of DHSs within each cell-type and experimental condition. DHSs showed strong association to gene expression, DNA methylation, and histone modifications. A priori mapping of existing DNA binding motifs within accessible genes and the cold C-repeat/dehydration responsive element-binding factor pathway resulted in unique motif mapping patterns. In summary, a collection of endodermal and epidermal cold acclimation induced chromatin accessibility sites may be used to understand mechanisms of gene expression and to best design synthetic promoters

    Thamodaran. P

    Get PDF
    Not AvailableUsually, most of the genes are biallelically expressed but imprinted gene exhibit monoallelic expression based on their parental origin. Genomic imprinting exhibit differences in control between flowering plants and mammals, for instance, imprinted gene are specifically activated by demethylation, rather than targeted for silencing in plants and imprinted gene expression in plant which occur in endosperm. It also displays sexual dimorphism like differential timing in imprint establishment and RNA based silencing mechanism in paternally repressed imprinted gene. Within imprinted regions, the unusual occurrence and distribution of various types of repetitive elements may act as genomic imprinting signatures. Imprinting regulation probably at many loci involves insulator protein dependent and higher-order chromatin interaction, and/or non-coding RNAs mediated mechanisms. However, placentaspecific imprinting involves repressive histone modifications and non-coding RNAs. The higher-order chromatin interaction involves differentially methylated domains (DMDs) exhibiting sex-specific methylation that act as scaffold for imprinting, regulate allelic-specific imprinted gene expression. The paternally methylated differentially methylated regions (DMRs) contain less CpGs than the maternally methylated DMRs. The non-coding RNAs mediated mechanisms include C/D RNA and microRNA, which are invovled in RNA-guided post-transcriptional RNA modifications and RNA-mediated gene silencing, respectively. The maintenance and reprogramming of imprinting are not significantly affected by reduced expression of Dicer1 and the evolution of imprinting might be related to acquisition of DNMT3L (de novo methyltransferase 3L) by a common ancestor of eutherians and marsupials. The common feature among diverse imprinting control elements and evolutionary significance of imprinting need to be identified.Not Availabl

    COMPARATIVE STUDY OF GENOMIC FEATURES OF EVOLUTIONARILY YOUNG GENE DUPLICATES

    Get PDF
    Gene duplication is considered a major contributor to genome evolution and functional diversity. Differences in genomic features (such as structural resemblance, transcriptional orientation, and genomic location) between members of a gene duplicate pair may indicate the possible duplication mechanisms, as well as the evolutionary fates the paralogs may experience. In addition to these genomic features, molecular genetic features, such as differences in codon usage and expression levels may provide further insight into functional changes between paralogs. In this dissertation, multiple genomic analyses were conducted in order to evaluate the differences in genomic and genetic properties between duplicate copies in order to understand the effect duplication mechanisms may have on the divergence of duplicate pairs. Chapter Two focuses on differing patterns of sequence asymmetry, codon usage, and gene expression levels between the members of gene duplicate pairs belonging to two different populations of paralogs in Saccharomyces cerevisiae: ohnologs, which arose via a whole genome duplication (WGD), and small segmental duplication (SSD) paralogs. It is shown that ohnologs have more highly conserved gene order (synteny) relative to SSD paralogs, despite their greater evolutionary age. Within SSD pairs, the derived paralog (the copy with lower synteny) seems to evolve faster, simultaneously exhibiting a lower CIA value and lower expression levels relative to the ancestral copy. While synteny and evolutionary rate differences were not coupled in ohnolog pairs, the relationship between evolutionary rate asymmetry, CAI, and expression levels was similar to that observed in SSD pairs. These results indicate that codon usage contributes to rate asymmetry in the evolution of gene duplicates in both, ohnologs and SSD paralogs, while differences in synteny (as experienced by SSD pairs, but not very young ohnologs) only affects rate asymmetry in SSD pairs. This may imply relaxed selection on codon usage and the expression of derived copies, potentially leading to the acquisition of novel functions over time. Chapters Three and Four focus on the effects of structural resemblance and other genomic features on young gene duplicate pairs within the Homo sapiens (human) and Pan troglodytes (chimpanzee) genomes. The results imply that the majority of gene duplicates in both species are structurally complete duplications, encompassing the entire coding region of a gene. The chimpanzee genome additionally contains a large fraction (46%) of retrotransposed young gene duplicates relative to the human genome (13%) which may be due to differences in genome architecture, such as mobile element content between the two genomes. While RNA-mediated processes lead to a majority of inter-chromosomal paralogs, DNA-mediated paralogs reside largely on the same chromosome, in which case inter-paralog distance does not increase over time. These results in conjunction with results of previous studies in nematodes, yeast, and flies, suggest that the structural resemblance types and location of duplicates are closely linked to the duplication mechanism by which paralog pairs arise. This is also true for closely related species, as illustrated by the comparison of the human and chimpanzee genomes. The above studies illustrate the relationship duplication span (as illustrated in Chapter Two) and mechanisms (illustrated in Chapters Three and Four) have on the location, synteny, structural resemblance types, and functionality of gene duplicates in different genomes. The findings imply that differences in mechanisms between species can have significant effects on the genome evolution and divergence between even closely related taxa

    Multispecies Analysis of Expression Pattern Diversification in the Recently Expanded Insect Ly6 Gene Family

    Get PDF
    The deposited article version is a "MBE Advance Access" published on March 4, 2015" provided by Oxford University Press, and it contains attached the supplementary materials within the pdf.The deposited article is a post-print version.Some supplementary materials are not present in the uploaded version of the article.Gene families often consist of members with diverse expression domains reflecting their functions in a wide variety of tissues. However, how the expression of individual members, and thus their tissue-specific functions, diversified during the course of gene family expansion is not well understood. In this study, we approached this question through the analysis of the duplication history and transcriptional evolution of a rapidly expanding subfamily of insect Ly6 genes. We analyzed different insect genomes and identified seven Ly6 genes that have originated from a single ancestor through sequential duplication within the higher Diptera. We then determined how the original embryonic expression pattern of the founding gene diversified by characterizing its tissue-specific expression in the beetle Tribolium castaneum, the butterfly Bicyclus anynana, and the mosquito Anopheles stephensi and those of its duplicates in three higher dipteran species, representing various stages of the duplication history (Megaselia abdita, Ceratitis capitata, and Drosophila melanogaster). Our results revealed that frequent neofunctionalization episodes contributed to the increased expression breadth of this subfamily and that these events occurred after duplication and speciation events at comparable frequencies. In addition, at each duplication node, we consistently found asymmetric expression divergence. One paralog inherited most of the tissue-specificities of the founder gene, whereas the other paralog evolved drastically reduced expression domains. Our approach attests to the power of combining a well-established duplication history with a comprehensive coverage of representative species in acquiring unequivocal information about the dynamics of gene expression evolution in gene families.IAEA Seibersdorf (Austria); USDA; Duke University; McGill University; DRGC (Kyoto); DSHB (Iowa); Toulouse RIO Imaging Platform; Fundação para a Ciência e a Tecnologia grants: (SFRH/BPD/75139/2010, ANR-13-ISV7-0001-01, ANR-13-ISV7-0001-02, FCT-ANR/BIA-ANM/0003/2013); Fundação Calouste Gulbenkian; Instituto Gulbenkian de Ciência; Agence Nationale de la Recherche (ANR).info:eu-repo/semantics/publishedVersio
    corecore