1,881 research outputs found

    Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants

    Get PDF
    For most sequenced flowering plants, multiple whole-genome duplications (WGDs) are found. Duplicated genes following WGD often have different fates that can quickly disappear again, be retained for long(er) periods, or subsequently undergo small-scale duplications. However, how different expression, epigenetic regulation, and functional constraints are associated with these different gene fates following a WGD still requires further investigation due to successive WGDs in angiosperms complicating the gene trajectories. In this study, we investigate lotus (Nelumbo nucifera), an angiosperm with a single WGD during the K–pg boundary. Based on improved intraspecific-synteny identification by a chromosome-level assembly, transcriptome, and bisulfite sequencing, we explore not only the fundamental distinctions in genomic features, expression, and methylation patterns of genes with different fates after a WGD but also the factors that shape post-WGD expression divergence and expression bias between duplicates. We found that after a WGD genes that returned to single copies show the highest levels and breadth of expression, gene body methylation, and intron numbers, whereas the long-retained duplicates exhibit the highest degrees of protein–protein interactions and protein lengths and the lowest methylation in gene flanking regions. For those long-retained duplicate pairs, the degree of expression divergence correlates with their sequence divergence, degree in protein–protein interactions, and expression level, whereas their biases in expression level reflecting subgenome dominance are associated with the bias of subgenome fractionation. Overall, our study on the paleopolyploid nature of lotus highlights the impact of different functional constraints on gene fate and duplicate divergence following a single WGD in plant

    Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants

    Get PDF
    Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research

    Rapid Bursts of \u3ci\u3eAndrogen-Binding Protein (Abp)\u3c/i\u3e Gene Duplication Occurred Independently in Diverse Mammals

    Get PDF
    Background The draft mouse (Mus musculus) genome sequence revealed an unexpected proliferation of gene duplicates encoding a family of secretoglobin proteins including the androgen-binding protein (ABP) α, β and γ subunits. Further investigation of 14 α-like (Abpa) and 13 β- or γ-like (Abpbg) undisrupted gene sequences revealed a rich diversity of developmental stage-, sex- and tissue-specific expression. Despite these studies, our understanding of the evolution of this gene family remains incomplete. Questions arise from imperfections in the initial mouse genome assembly and a dearth of information about the gene family structure in other rodents and mammals. Results Here, we interrogate the latest \u27finished\u27 mouse (Mus musculus) genome sequence assembly to show that the Abp gene repertoire is, in fact, twice as large as reported previously, with 30 Abpa and 34 Abpbg genes and pseudogenes. All of these have arisen since the last common ancestor with rat (Rattus norvegicus). We then demonstrate, by sequencing homologs from species within the Mus genus, that this burst of gene duplication occurred very recently, within the past seven million years. Finally, we survey Abp orthologs in genomes from across the mammalian clade and show that bursts of Abp gene duplications are not specific to the murid rodents; they also occurred recently in the lagomorph (rabbit, Oryctolagus cuniculus) and ruminant (cattle, Bos taurus) lineages, although not in other mammalian taxa. Conclusion We conclude that Abp genes have undergone repeated bursts of gene duplication and adaptive sequence diversification driven by these genes\u27 participation in chemosensation and/or sexual identification

    A new reference genome assembly for the microcrustacean Daphnia pulex

    Get PDF
    Comparing genomes of closely related genotypes from populations with distinct demographic histories can help reveal the impact of effective population size on genome evolution. For this purpose, we present a high quality genome assembly of Daphnia pulex (PA42), and compare this with the first sequenced genome of this species (TCO), which was derived from an isolate from a population with >90% reduction in nucleotide diversity. PA42 has numerous similarities to TCO at the gene level, with an average amino acid sequence identity of 98.8 and >60% of orthologous proteins identical. Nonetheless, there is a highly elevated number of genes in the TCO genome annotation, with similar to 7000 excess genes appearing to be false positives. This view is supported by the high GC content, lack of introns, and short length of these suspicious gene annotations. Consistent with the view that reduced effective population size can facilitate the accumulation of slightly deleterious genomic features, we observe more proliferation of transposable elements (TEs) and a higher frequency of gained introns in the TCO genome

    Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

    Get PDF
    We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene

    Strategies for Reliable Exploitation of Evolutionary Concepts in High Throughput Biology

    Get PDF
    The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology

    SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    Get PDF
    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/
    corecore