42 research outputs found

    Fast and sensitive multiple alignment of large genomic sequences.

    Get PDF
    BACKGROUND: Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. RESULTS: Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. CONCLUSION: We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Reconfigurable hardware-software codesign methodology for protein identification

    Get PDF

    AUGUSTUS: ab initio prediction of alternative transcripts

    Get PDF
    AUGUSTUS is a software tool for gene prediction in eukaryotes based on a Generalized Hidden Markov Model, a probabilistic model of a sequence and its gene structure. Like most existing gene finders, the first version of AUGUSTUS returned one transcript per predicted gene and ignored the phenomenon of alternative splicing. Herein, we present a WWW server for an extended version of AUGUSTUS that is able to predict multiple splice variants. To our knowledge, this is the first ab initio gene finder that can predict multiple transcripts. In addition, we offer a motif searching facility, where user-defined regular expressions can be searched against putative proteins encoded by the predicted genes. The AUGUSTUS web interface and the downloadable open-source stand-alone program are freely available from

    DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Get PDF
    BACKGROUND: Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. RESULTS: Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. CONCLUSIONS: By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope

    Biosynthesis of mycobacterial arabinogalactan: identification of a novel (13)arabinofuranosyltransferase

    Get PDF
    The cell wall mycolyl-arabinogalactan-peptidoglycan complex is essential in mycobacterial species, such as Mycobacterium tuberculosis and is the target of several anti-tubercular drugs. For instance, ethambutol targets arabinogalactan biosynthesis through inhibition of the arabinofuranosyltransferases Mt-EmbA and Mt-EmbB. A bioinformatics approach identified putative integral membrane proteins, MSMEG2785 in Mycobacterium smegmatis, Rv2673 in Mycobacterium tuberculosis and NCgl1822 in Corynebacterium glutamicum, with 10 predicted transmembrane domains and a glycosyltransferase motif (DDX), features that are common to the GT-C superfamily of glycosyltransferases. Deletion of M. smegmatis MSMEG2785 resulted in altered growth and glycosyl linkage analysis revealed the absence of AG (13)-linked arabinofuranosyl (Araf) residues. Complementation of the M. smegmatis deletion mutant was fully restored to a wild type phenotype by MSMEG2785 and Rv2673, and as a result, we have now termed this previously uncharacterized open reading frame, arabinofuranosyltransferase C (aftC). Enzyme assays using the sugar donor -D-arabinofuranosyl-1-monophosphoryldecaprenol (DPA) and a newly synthesized linear (15)-linked Ara5 neoglycolipid acceptor together with chemical identification of products formed, clearly identified AftC as a branching (13) arabinofuranosyltransferase. This newly discovered glycosyltransferase sheds further light on the complexities of Mycobacterium cell wall biosynthesis, such as in M. tuberculosis and related species and represents a potential new drug target

    AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome

    Get PDF
    BACKGROUND: A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project. RESULTS: AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account. CONCLUSION: AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration
    corecore