Search CORE

117 research outputs found

Segment-based multiple sequence alignment

Author: Emde A.-K.
Notredame C.
Rausch T.
Reinert K.
Weese D.
Publication venue
Publication date: 01/01/2008
Field of study

Motivation: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given the importance and wide-spread use of alignment tools, progress in both categories is a contribution to the community and has driven research in the field so far. Results: We introduce a graph-based extension to the consistency-based, progressive alignment strategy. We apply the consistency notion to segments instead of single characters. The main problem we solve in this context is to define segments of the sequences in such a way that a graph-based alignment is possible. We implemented the algorithm using the SeqAn library and report results on amino acid and DNA sequences. The benefit of our approach is threefold: (1) sequences with conserved blocks can be rapidly aligned, (2) the implementation is conceptually easy, generic and fast and (3) the consistency idea can be extended to align multiple genomic sequences. Availability: The segment-based multiple sequence alignment tool can be downloaded from http://www.seqan.de/projects/msa.html. A novel version of T-Coffee interfaced with the tool is available from http://www.tcoffee.org. The usage of the tool is described in both documentations. Contact: [email protected]

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

Author: Kaufmann Michael
Morgenstern Burkhard
Subramanian Amarendran R
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach. Results Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called <it>DIRM-BASE 1</it>. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences. Conclusion On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS

Author: A. R. Subramanian
B. Morgenstern
Brudno
Do
E. Corel
Edgar
Edgar
Edgar
Feng
Heringa
Lenhof
Montgomerie
Morgenstern
Morgenstern
Morgenstern
P. Meinicke
Pohler
R. Steinkamp
S. Hiran
Subramanian
Subramanian
Taylor
Thompson
Wong
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

We introduce web interfaces for two recent extensions of the multiple-alignment program DIALIGN. DIALIGN-TX combines the greedy heuristic previously used in DIALIGN with a more traditional ‘progressive’ approach for improved performance on locally and globally related sequence sets. In addition, we offer a version of DIALIGN that uses predicted protein secondary structures together with primary sequence information to construct multiple protein alignments. Both programs are available through ‘Göttingen Bioinformatics Compute Server’ (GOBICS)

CiteSeerX

Crossref

PubMed Central

Quality measures for protein alignment benchmarks

Author: Altschul
Altschul
Armougom
Babon
Bahr
Barford
Blackshields
Boutonnet
Bradley
Brenner
Bullock
Colloc'h
Do
Edgar
Edgar
Etchebest
Godzik
Gough
Hasegawa
Holm
Jones
Kabsch
McClure
Mizuguchi
Murzin
Needleman
O'S
Orengo
Raghava
Robert C. Edgar
Roshan
Rost
Russell
Sauder
Schwartz
Shindyalov
Siddiqui
Subramanian
Subramanian
Taylor
Thompson
Thompson
Thompson
Van Walle
Van Walle
Yu
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Multiple protein sequence alignment methods are central to many applications in molecular biology. These methods are typically assessed on benchmark datasets including BALIBASE, OXBENCH, PREFAB and SABMARK, which are important to biologists in making informed choices between programs. In this article, annotations of domain homology and secondary structure are used to define new measures of alignment quality and are used to make the first systematic, independent evaluation of these benchmarks. These measures indicate sensitivity and specificity while avoiding the ambiguous residue correspondences and arbitrary distance cutoffs inherent to structural superpositions. Alignments by selected methods that indicate high-confidence columns (ALIGN-M, DIALIGN-T, FSA and MUSCLE) are also assessed. Fold space coverage and effective benchmark database sizes are estimated by reference to domain annotations, and significant redundancy is found in all benchmarks except SABMARK. Questionable alignments are found in all benchmarks, especially in BALIBASE where 87% of sequences have unknown structure, 20% of columns contain different folds according to SUPERFAMILY and 30% of ‘core block’ columns have conflicting secondary structure according to DSSP. A careful analysis of current protein multiple alignment benchmarks calls into question their ability to determine reliable algorithm rankings

CiteSeerX

Crossref

PubMed Central

Phylogenetic assessment of alignments reveals neglected tree signal in gaps

Author: Dessimoz Christophe
Gil Manuel
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Tree-based tests of alignment methods enable the evaluation of the effect of gap placement on the inference of phylogenetic relationships

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

PubMed Central

The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods

Author: Armougom Fabrice
Higgins Desmond G.
Jongeneel Cornelius V.
Moretti Sebastien
Notredame Cedric
Wallace Iain M.
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

The M-Coffee server is a web server that makes it possible to compute multiple sequence alignments (MSAs) by running several MSA methods and combining their output into one single model. This allows the user to simultaneously run all his methods of choice without having to arbitrarily choose one of them. The MSA is delivered along with a local estimation of its consistency with the individual MSAs it was derived from. The computation of the consensus multiple alignment is carried out using a special mode of the T-Coffee package [Notredame, Higgins and Heringa (T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000; 302: 205–217); Wallace, O'Sullivan, Higgins and Notredame (M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006; 34: 1692–1699)] Given a set of sequences (DNA or proteins) in FASTA format, M-Coffee delivers a multiple alignment in the most common formats. M-Coffee is a freeware open source package distributed under a GPL license and it is available either as a standalone package or as a web service from www.tcoffee.org

Serveur académique lausannois

PubMed Central