3 research outputs found

    The evolution of the tape measure protein: units, duplications and losses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A large family of viruses that infect bacteria, called <it>phages</it>, is characterized by long tails used to inject DNA into their victims' cells. The <it>tape measure protein</it> got its name because the length of the corresponding gene is proportional to the length of the phage's tail: a fact shown by actually copying or splicing out parts of DNA in exemplar species. A natural question is whether there exist <it>units</it> for these tape measures, and if different tape measures have different units and lengths. Such units would allow us to retrace the evolution of tape measure proteins using their duplication/loss history. The vast number of sequenced phages genomes allows us to attack this problem with a comparative genomics approach.</p> <p>Results</p> <p>Here we describe a subset of phages whose tape measure proteins contain variable numbers of an 11 amino acids sequence repeat, aligned with sequence similarity, structural properties, and simple arithmetics. This subset provides a unique opportunity for the combinatorial study of phage evolution, without the added uncertainties of multiple alignments, which are trivial in this case, or of protein functions, that are well established. We give a heuristic that reconstructs the duplication history of these sequences, using divergent strains to discriminate between mutations that occurred before and after speciation, or lineage divergence. The heuristic is based on an efficient algorithm that gives an exhaustive enumeration of all possible parsimonious reconstructions of the duplication/speciation history of a single nucleotide. Finally, we present a method that allows, when possible, to discriminate between duplication and loss events.</p> <p>Conclusions</p> <p>Establishing the evolutionary history of viruses is difficult, in part due to extensive recombinations and gene transfers, and high mutation rates that often erase detectable similarity between homologous genes. In this paper, we introduce new tools to address this problem.</p

    Evolutionary History Reconstruction for Mammalian Complex Gene Clusters

    No full text
    Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest. We have developed an algorithm for reconstructing the evolutionary history of gene clusters using only human genomic sequence data, which allows the tempo of large-scale evolutionary events in human gene clusters to be estimated. We further propose an extension of the method to simultaneously reconstructing the evolutionary histories of orthologous gene clusters in multiple primates, which will facilitate primate comparative sequencing studies that aim to reconstruct their evolutionary history more fully
    corecore