74 research outputs found

    Kalign – an accurate and fast multiple sequence alignment algorithm

    Get PDF
    BACKGROUND: The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. RESULTS: We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. CONCLUSION: Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences

    Structural relatedness of lysis proteins from colicinogenic plasmids and icosahedral coliphages.

    Get PDF
    The host-lysis-inducing functions of phi X174 protein E and MS2 protein L were recently shown to reside on the N-terminal and C-terminal halves of the two respective lysis proteins. In the present study it is shown that the small lysis proteins encoded in various colicinogenic plasmids share local sequence similarities and certain structural characteristics with the essential peptides of their coliphage-coded counterparts. Despite their dissimilar sizes and origins, it is suggested that the colicinogenic lysis proteins are functionally analogous and evolutionarily related to those of icosahedral single-stranded DNA and RNA phages

    Эффективность фильтрации в статистических алгоритмах быстрого поиска гомологии

    No full text
    При поиске локальных гомологий, (поиск гомологий в генетических банках, выбор оптимальных олигонуклеотидных зондов и т. п.) возникает проблема их «быстрого» поиска. Квадратичная трудоемкость алгоритмов динамического программирования заставляет прибегать к методам фильтрации, позволяющим быстро «отбраковать» последовательности с низким уровнем гомологии. В работе вводится понятие эффективности фильтрации и дается оценка эффективности некоторых фильтров, при этом показано, что в l-граммном анализе эффективность фильтрации связана с потенциальным расширением исходного 4-буквенного алфавита.При пошуку локальних гомологій (пошук гомологій у генетичних банках, вибір оптимальних олігонуклеотидних зондів і т. п.) виникає проблема його «пришвидчення». Квадратична трудомісткість алгоритмів динамічного програмування змушує вдаватися до методів фільтрації, що дозволяє швидко «відбракувати» послідовності з низьким рівнем гомології. У роботі вводиться поняття ефективності фільтрації і дається оцінка ефективності деяких фільтрів, при цьому показано, що в l-грамному аналізі ефективність фільтрації пов’язана з потенційним розширенням вихідного 4-літерного алфавіту.Upon searching local homologies in long sequences (homology search in nucleotide and amino acid sequences banks, selection of optimal oligonucleotide probes etc.) the necessity of a «rapid» homology search becomes acute. Quadratic complexity of (he dymanic programming algorithms (Needleman–Wunsch and Sellers type) forces the employment of filtration methods, that permits one to reject the sequences with a low homology level (among the filtration methods the 1–tuple analysis and the statistical method of Mironov–Alexandrov were used). But theoretical substantiations of such algorithms have not been made yet. The present work introduces the notion of filtration efficiency and the efficiency of several filters is given. It was shown that in the 1–tuple analysis the filtration efficiency is associated with the potential extension of the original four– letter alphabet. The formulas that allow choosing the filtration parameters are presented

    The d-arabitol operon of Klebsiella aerogenes

    Get PDF
    Imperial Users onl

    Biological sequence comparison on a parallel computer

    Get PDF
    corecore