234 research outputs found

    Amino acid substitution during functionally constrained divergent evolution of protein sequences

    Get PDF
    In aligning homologous protein sequences, it is generally assumed that amino acid substitutions subsequent in time occur independently of amino acid substitutions previous in time, i.e. that patterns of mulation are similar at low and high sequence divergence. This assumption is examined here and shown to be incorrect in an interesting way. Separate mutation matrices were constructed for aligned protein sequence pairs at divergences ranging from 5 to 100 PAM units (point accepted mutations per 100 aligned positions). From these, the corresponding log-odds (Day-hoff) matrices, normalized to 250 PAM units, were constructed. The matrices show that the genetic code influences accepted point mutations strongly at early stages of divergence, while the chemical properties of the side chains dominate at more advanced stage

    Robust Padé Approximation via SVD

    Full text link

    Optimal Arrangement of Keys in a Hash Table

    Full text link

    Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

    Full text link
    We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

    OMA 2011: orthology inference among 1000 complete genomes

    Get PDF
    OMA (Orthologous MAtrix) is a database that identifies orthologs among publicly available, complete genomes. Initiated in 2004, the project is at its 11th release. It now includes 1000 genomes, making it one of the largest resources of its kind. Here, we describe recent developments in terms of species covered; the algorithmic pipeline—in particular regarding the treatment of alternative splicing, and new features of the web (OMA Browser) and programming interface (SOAP API). In the second part, we review the various representations provided by OMA and their typical applications. The database is publicly accessible at http://omabrowser.org

    Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences

    Get PDF
    BACKGROUND: The estimation of the difference between two evolutionary distances within a triplet of homologs is a common operation that is used for example to determine which of two sequences is closer to a third one. The most accurate method is currently maximum likelihood over the entire triplet. However, this approach is relatively time consuming. RESULTS: We show that an alternative estimator, based on pairwise estimates and therefore much faster to compute, has almost the same statistical power as the maximum likelihood estimator. We also provide a numerical approximation for its variance, which could otherwise only be estimated through an expensive re-sampling approach such as bootstrapping. An extensive simulation demonstrates that the approximation delivers precise confidence intervals. To illustrate the possible applications of these results, we show how they improve the detection of asymmetric evolution, and the identification of the closest relative to a given sequence in a group of homologs. CONCLUSION: The results presented in this paper constitute a basis for large-scale protein cross-comparisons of pairwise evolutionary distances
    corecore