6 research outputs found

    Kalign – an accurate and fast multiple sequence alignment algorithm

    Get PDF
    BACKGROUND: The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. RESULTS: We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. CONCLUSION: Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences

    Structural relatedness of lysis proteins from colicinogenic plasmids and icosahedral coliphages.

    Get PDF
    The host-lysis-inducing functions of phi X174 protein E and MS2 protein L were recently shown to reside on the N-terminal and C-terminal halves of the two respective lysis proteins. In the present study it is shown that the small lysis proteins encoded in various colicinogenic plasmids share local sequence similarities and certain structural characteristics with the essential peptides of their coliphage-coded counterparts. Despite their dissimilar sizes and origins, it is suggested that the colicinogenic lysis proteins are functionally analogous and evolutionarily related to those of icosahedral single-stranded DNA and RNA phages

    Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries.

    No full text
    We present an algorithm--a generalization of the Needleman-Wunsch-Sellers algorithm--which finds within longer sequences all subsequences that resemble one another locally. The probability that so close a resemblance would occur by chance alone is calculated and used to classify these local homologies according to statistical significance. Repeats and inverted repeats may also be found. Results for both random and biological nucleic acid sequences are presented. Fourteen complete genomes are analyzed for dyad symmetries

    Systematic computational analysis of potential RNA interference regulation in Toxoplasma gondii

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Molecular Biology and Genetics, Izmir, 2009Includes bibliographical references (leaves: 58-73)Text in English; Abstract: Turkish and Englishx, 79 leavesRNA-mediated silencing was first described in plants and became famous by studies in Caenorhabditis elegans. RNA interference (RNAi) is the mechanism through which an RNA interferes with the production of other RNAs in a sequence specific manner. MiRNAs are a type of RNA which originate from the genome with their active form being ss-RNAs of 21-23 nucleotides in length. They are being transcribed as primiRNAs then processed in the nucleus by Drosha to pre-miRNAs with a stem-loop structure and 70 nucleotides in length. This stem-loop containing pre-miRNAs is then processed in the cytoplasm to ds-RNA one strand of which will serve as interfering RNA. Toxoplasma gondii is a species of parasitic protozoa which causes several diseases. T.gondii emerges as a good candidate for computational efforts with its small genome size, publicly available genome files and extensive information about its gene structure, either based on experimental data or the prediction with several gene finders in parallel. Therefore, it seems important to establish the regulatory network composed of RNAi which may be beneficial for the Toxoplasma community. Within this context the pool of possible stem-loop constitutive transcripts are produced, further analysis of this pool for desired 2D structure is integrated and mapping of possible RNAi regulation to T.gondii.s genome is established. In connection with computational assessment and mapping, the derived information is provided as a database for quick lookup using a convenient web interface for experimental studies of RNAi regulation in Toxoplasma, thus reduce time and money costs in such studies

    Biological sequence comparison on a parallel computer

    Get PDF

    Sequence analysis of enzymes of the shikimate pathway: Development of a novel multiple sequence alignment algorithm

    Get PDF
    The possibility of homology modelling the shikimate pathway enzymes, 3-dehydroquinate synthase (el), 3-dehydroquinase (e2), shikimate dehydrogenase (e3), shikimate kinase (e4) and 5-enolpyruvylshikimate 3 -phosphate (EPSP) synthase (e5) is investigated. The sequences of these enzymes are analysed and the results found indicate that for four of these proteins, el, e2, e3, and e5, no structural homologues exist. Developing a model structure by homology modelling is therefore not possible. For shikimate kinase, statistically significant alignments are found to two proteins with known structures, adenylate kinase and H-ras p21 protein. These are also judged to be biologically significant alignments. However, the alignments obtained show too little sequence identity to permit homology modelling based on primary sequence data alone. An ab initio based methodology is next applied, with the initial step being careful evaluation of multiple sequence alignments of the shikimate pathway enzymes. Altering the parameters of the available multiple sequence alignment algorithms, produces a large range of differing alignments, with no objective way to choose a single alignment or construct a composite from the many produced for each shikimate pathway enzyme. This problem with obtaining a reliable alignment for the shikimate pathway enzyme will occur in other low sequence identity protein families, and is addressed by the development of a novel multiple sequence alignment method, Mix'n'Match. Mix'n'Match is based on finding alternating Strongly Conserved Regions (SCRs) and Loosely Conserved Regions (LCRs) in the protein sequences. The SCRs are used as 'anchors' in the alignment and are calculated from analysis of several different multiple alignments, made using varying criteria. After divided the sequences into Strongly Conserved Regions (SCRs) and Loosely Conserved Regions (LCRs), the 'best' alignment for each LCR is chosen, independently of the other LCRs, from a selection of possibilities in the multiple alignments. To help make this choice for each LCR, the secondary structure is predicted and sliown alongside each different possible alignment. One advantage of this method over automatic, non-interactive, methods, is that the final alignment is not dependent on the choice of a single set of scoring parameters. Another is that, by allowing interactive choice and by taking account of secondary structural information, the final alignment is based more on biological, rather than mathematical factors. This method can produce better alignments than any of the initial automatic multiple alignment methods used. The SCRs identified by Mix'n'Match, are found to show good correlation with the actual secondary structural elements present in the enzyme families used to test the method. Analysis of the Mix'n'Match alignment and consensus secondary structure predictions for shikimate kinase, suggest a closer match with the actual secondary structure of adenylate kinase, than is found between their amino acid sequences. These proteins appear to share functional, sequence and secondary structural homology. The proposal is made that a model structure of shikimate kinase, based on the structure of adenylate kinase, could be constructed using homology modelling techniques
    corecore