19,197 research outputs found

    Birth and death of gene overlaps in vertebrates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Between five and fourteen per cent of genes in the vertebrate genomes do overlap sharing some intronic and/or exonic sequence. It was observed that majority of these overlaps are not conserved among vertebrate lineages. Although several mechanisms have been proposed to explain gene overlap origination the evolutionary basis of these phenomenon are still not well understood. Here, we present results of the comparative analysis of several vertebrate genomes. The purpose of this study was to examine overlapping genes in the context of their evolution and mechanisms leading to their origin.</p> <p>Results</p> <p>Based on the presence and arrangement of human overlapping genes orthologs in rodent and fish genomes we developed 15 theoretical scenarios of overlapping genes evolution. Analysis of these theoretical scenarios and close examination of genomic sequences revealed new mechanisms leading to the overlaps evolution and confirmed that many of the vertebrate gene overlaps are not conserved. This study also demonstrates that repetitive elements contribute to the overlapping genes origination and, for the first time, that evolutionary events could lead to the loss of an ancient overlap.</p> <p>Conclusion</p> <p>Birth as well as most probably death of gene overlaps occurred over the entire time of vertebrate evolution and there wasn't any rapid origin or 'big bang' in the course of overlapping genes evolution. The major forces in the gene overlaps origination are transposition and exaptation. Our results also imply that origin of overlapping genes is not an issue of saving space and contracting genomes size.</p

    The early expansion and evolutionary dynamics of POU class genes.

    Get PDF
    The POU genes represent a diverse class of animal-specific transcription factors that play important roles in neurogenesis, pluripotency, and cell-type specification. Although previous attempts have been made to reconstruct the evolution of the POU class, these studies have been limited by a small number of representative taxa, and a lack of sequences from basally branching organisms. In this study, we performed comparative analyses on available genomes and sequences recovered through "gene fishing" to better resolve the topology of the POU gene tree. We then used ancestral state reconstruction to map the most likely changes in amino acid evolution for the conserved domains. Our work suggests that four of the six POU families evolved before the last common ancestor of living animals-doubling previous estimates-and were followed by extensive clade-specific gene loss. Amino acid changes are distributed unequally across the gene tree, consistent with a neofunctionalization model of protein evolution. We consider our results in the context of early animal evolution, and the role of POU5 genes in maintaining stem cell pluripotency

    The evolution, distribution and diversity of endogenous circoviral elements in vertebrate genomes

    Get PDF
    Circoviruses (family Circoviridae) are small, non-enveloped viruses that have short, single-stranded DNA genomes. Circovirus sequences are frequently recovered in metagenomic investigations, indicating that these viruses are widespread, yet they remain relatively poorly understood. Endogenous circoviral elements (CVe) are DNA sequences derived from circoviruses that occur in vertebrate genomes. CVe are a useful source of information about the biology and evolution of circoviruses. In this study, we screened 362 vertebrate genome assemblies in silico to generate a catalog of CVe loci. We identified a total of 179 CVe sequences, most of which have not been reported previously. We show that these CVe loci reflect at least 19 distinct germline integration events. We determine the structure of CVe loci, identifying some that show evidence of potential functionalization. We also identify orthologous copies of CVe in snakes, fish, birds, and mammals, allowing us to add new calibrations to the timeline of circovirus evolution. Finally, we observed that some ancient CVe group robustly with contemporary circoviruses in phylogenies, with all sequences within these groups being derived from the same host class or order, implying a hitherto underappreciated stability in circovirus-host relationships. The openly available dataset constructed in this investigation provides new insights into circovirus evolution, and can be used to facilitate further studies of circoviruses and CVe

    TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates

    Get PDF
    Transposed elements (TEs) are mobile genetic sequences. During the evolution of eukaryotes TEs were inserted into active protein-coding genes, affecting gene structure, expression and splicing patterns, and protein sequences. Genomic insertions of TEs also led to creation and expression of new functional non-coding RNAs such as micro- RNAs. We have constructed the TranspoGene database, which covers TEs located inside proteincoding genes of seven species: human, mouse, chicken, zebrafish, fruit fly, nematode and sea squirt. TEs were classified according to location within the gene: proximal promoter TEs, exonized TEs (insertion within an intron that led to exon creation), exonic TEs (insertion into an existing exon) or intronic TEs. TranspoGene contains information regarding specific type and family of the TEs, genomic and mRNA location, sequence, supporting transcript accession and alignment to the TE consensus sequence. The database also contains host gene specific data: gene name, genomic location, Swiss-Prot and RefSeq accessions, diseases associated with the gene and splicing pattern. In addition, we created microTranspoGene: a database of human, mouse, zebrafish and nematode TEderived microRNAs. The TranspoGene and micro- TranspoGene databases can be used by researchers interested in the effect of TE insertion on the eukaryotic transcriptome

    Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs

    Get PDF
    Background: It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e. g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs. Results: Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data, hinting to sORF translation. All candidates are visually inspected using an in-house developed genome browser. In this way dozens of highly conserved sORFs, targeted by ribosomes were identified in the mouse genome, putatively encoding micropeptides. Conclusion: Our combined genome-wide approach leads to the prediction of a comprehensive but manageable set of putatively coding sORFs, a very important first step towards the identification of a new class of bioactive peptides, called micropeptides
    • …
    corecore