3,015 research outputs found

    The evolution of the tape measure protein: units, duplications and losses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A large family of viruses that infect bacteria, called <it>phages</it>, is characterized by long tails used to inject DNA into their victims' cells. The <it>tape measure protein</it> got its name because the length of the corresponding gene is proportional to the length of the phage's tail: a fact shown by actually copying or splicing out parts of DNA in exemplar species. A natural question is whether there exist <it>units</it> for these tape measures, and if different tape measures have different units and lengths. Such units would allow us to retrace the evolution of tape measure proteins using their duplication/loss history. The vast number of sequenced phages genomes allows us to attack this problem with a comparative genomics approach.</p> <p>Results</p> <p>Here we describe a subset of phages whose tape measure proteins contain variable numbers of an 11 amino acids sequence repeat, aligned with sequence similarity, structural properties, and simple arithmetics. This subset provides a unique opportunity for the combinatorial study of phage evolution, without the added uncertainties of multiple alignments, which are trivial in this case, or of protein functions, that are well established. We give a heuristic that reconstructs the duplication history of these sequences, using divergent strains to discriminate between mutations that occurred before and after speciation, or lineage divergence. The heuristic is based on an efficient algorithm that gives an exhaustive enumeration of all possible parsimonious reconstructions of the duplication/speciation history of a single nucleotide. Finally, we present a method that allows, when possible, to discriminate between duplication and loss events.</p> <p>Conclusions</p> <p>Establishing the evolutionary history of viruses is difficult, in part due to extensive recombinations and gene transfers, and high mutation rates that often erase detectable similarity between homologous genes. In this paper, we introduce new tools to address this problem.</p

    The Tandem Duplication Distance Is NP-Hard

    Get PDF
    In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ? AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems

    Integrated multiple sequence alignment

    Get PDF
    Sammeth M. Integrated multiple sequence alignment. Bielefeld (Germany): Bielefeld University; 2005.The thesis presents enhancements for automated and manual multiple sequence alignment: existing alignment algorithms are made more easily accessible and new algorithms are designed for difficult cases. Firstly, we introduce the QAlign framework, a graphical user interface for multiple sequence alignment. It comprises several state-of-the-art algorithms and supports their parameters by convenient dialogs. An alignment viewer with guided editing functionality can also highlight or print regions of the alignment. Also phylogenetic features are provided, e.g., distance-based tree reconstruction methods, corrections for multiple substitutions and a tree viewer. The modular concept and the platform-independent implementation guarantee an easy extensibility. Further, we develop a constrained version of the divide-and-conquer alignment such that it can be restricted by anchors found earlier with local alignments. It can be shown that this method shares attributes of both, local and global aligners, in the quality of results as well as in the computation time. We further modify the local alignment step to work on bipartite (or even multipartite) sets for sequences where repeats overshadow valuable sequence information. In the end a technique is established that can accurately align sequences containing eventually repeated motifs. Finally, another algorithm is presented that allows to compare tandem repeat sequences by aligning them with respect to their possible repeat histories. We describe an evolutionary model including tandem duplications and excisions, and give an exact algorithm to compare two sequences under this model

    Ancient properties of spider silks revealed by the complete gene sequence of the prey-wrapping silk protein (AcSp1).

    Get PDF
    Spider silk fibers have impressive mechanical properties and are primarily composed of highly repetitive structural proteins (termed spidroins) encoded by a single gene family. Most characterized spidroin genes are incompletely known because of their extreme size (typically &gt;9 kb) and repetitiveness, limiting understanding of the evolutionary processes that gave rise to their unusual gene architectures. The only complete spidroin genes characterized thus far form the dragline in the Western black widow, Latrodectus hesperus. Here, we describe the first complete gene sequence encoding the aciniform spidroin AcSp1, the primary component of spider prey-wrapping fibers. L. hesperus AcSp1 contains a single enormous (∼19 kb) exon. The AcSp1 repeat sequence is exceptionally conserved between two widow species (∼94% identity) and between widows and distantly related orb-weavers (∼30% identity), consistent with a history of strong purifying selection on its amino acid sequence. Furthermore, the 16 repeats (each 371-375 amino acids long) found in black widow AcSp1 are, on average, &gt;99% identical at the nucleotide level. A combination of stabilizing selection on amino acid sequence, selection on silent sites, and intragenic recombination likely explains the extreme homogenization of AcSp1 repeats. In addition, phylogenetic analyses of spidroin paralogs support a gene duplication event occurring concomitantly with specialization of the aciniform glands and the tubuliform glands, which synthesize egg-case silk. With repeats that are dramatically different in length and amino acid composition from dragline spidroins, our L. hesperus AcSp1 expands the knowledge base for developing silk-based biomimetic technologies

    Bacterial microevolution and the Pangenome

    Get PDF
    The comparison of multiple genome sequences sampled from a bacterial population reveals considerable diversity in both the core and the accessory parts of the pangenome. This diversity can be analysed in terms of microevolutionary events that took place since the genomes shared a common ancestor, especially deletion, duplication, and recombination. We review the basic modelling ingredients used implicitly or explicitly when performing such a pangenome analysis. In particular, we describe a basic neutral phylogenetic framework of bacterial pangenome microevolution, which is not incompatible with evaluating the role of natural selection. We survey the different ways in which pangenome data is summarised in order to be included in microevolutionary models, as well as the main methodological approaches that have been proposed to reconstruct pangenome microevolutionary history

    Gene & Genome Duplication in Acanthamoeba Polyphaga Mimivirus

    Full text link
    Gene duplication is key to molecular evolution in all three domains of life and may be the first step in the emergence of new gene function. It is a well recognized feature in large DNA viruses, but has not been studied extensively in the largest known virus to date, the recently discovered Acanthamoeba Polyphaga Mimivirus. Here we present a systematic analysis of gene and genome duplication events in the Mimivirus genome. We find that one third of the Mimivirus genes are related to at least one other gene in the Mimivirus genome, either through a large segmental genome duplication event that occurred in the more remote past, either through more recent gene duplication events, which often occur in tandem. This shows that gene and genome duplication played a major role in shaping the Mimivirus genome. Using multiple alignments together with remote homology detection methods based on Hidden Markov Model comparison, we assign putative functions to some of the paralogous gene families. We suggest that a large part of the duplicated Mimivirus gene families are likely to interfere with important host cell processes, such as transcription control, protein degradation, and cell regulatory processes. Our findings support the view that large DNA viruses are complex evolving organisms, possibly deeply rooted within the tree of life, and oppose the paradigm that viral evolution is dominated by lateral gene acquisition, at least in what concerns large DNA viruses

    Isolation of a transcriptionally active element of high copy number retrotransposons in sweetpotato genome

    Get PDF
    Many plant retrotransposons have been characterized, but only three families (Tnt1, Tto1 and Tos17) have been demonstrated to be transpositionally competent. We followed a novel approach that enabled us to identify an active element of the Ty1-copia retrotransposon family with estimated 400 copies in the sweetpotato genome. DNA sequences of Ty1 -copia reverse transcriptase (RTase) from the sweetpotato genome were analyzed, and a group of retrotransposon copies probably formed by recent transposition events was further analyzed. 3’RACE on callus cDNA amplified transcripts containing long terminal repeats (LTR) of this group. The sequence -specific amplification polymorphism (S-SAP) patterns of the LTR sequence in the genomic DNA were compared between a normal plant and callus lines derived from it. A callus -specific S-SAP product was found into which the retrotransposon detected by the 3’RACE had been transposed apparently during cell culture. We conclude that our approach provides an effective way to identify active elements of retrotransposons with high copy numbers.</p
    corecore