Abstract Background Determining beforehand specific positions to align (<it>anchor points</it>) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. <it>Multiple </it>local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment. Results We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3. Conclusions We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3.</p

Corel, Eduardo

Devauchelle, Claudine

Pitschi, Florian

English

PubMed

Florian Pitschi

Claudine Devauchelle

Eduardo Corel

Springer - Publisher Connector

Automatic detection of anchor points for multiple sequence alignment

Crossref

Abstract Background Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment. Results We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3. Conclusions We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3.</p

Pitschi Florian

Devauchelle Claudine

Corel Eduardo

Directory of Open Access Journals

BMC Bioinformatics

A min-cut algorithm for the consistency problem in multiple sequence alignment. Bioinformatics 2010, 26(8):1015-1021. doi:10.1186/1471-2105-11-445 Cite this article as: Pitschi et al.: Automatic detection of anchor points for multiple sequence alignment.

A unifying framework for seed sensitivity and its application to subset seeds.

Batzoglou S: Multiple Sequence Alignment. Current Opinion in Structural Biology

Corel E: DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS. Nucl Acids Res

DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches.

Devauchelle C: MS4 -Multi-Scale Selector of Sequence Signatures: An alignment-free method for the classification of biological sequences.

DG: ClustalW and ClustalX version 2.0. Bioinformatics

DIALIGN-TX: greedy and progressive approaches for the segment-based multiple sequence alignment. Algorithms for Molecular Biology

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

Hénaut A: Local Decoding of sequences and alignment-free comparison.

Multiple DNA and protein sequence alignment based on segment-to-segment comparison.

Multiple sequence alignment with userdefined constraints at GOBICS. Bioinformatics

MUSCLE: Multiple sequence alignment with high score accuracy and high throughput. Nuc Acids Res

Notredame C: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics

O: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins: Structure, Function, and Bioinformatics

Ohlebusch E: Efficient multiple genome alignment. Bioinformatics

Perna NT: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Research

PF: Multiple sequence alignment with user-defined anchor points. Algorithms for Molecular Biology

Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol

T-Coffee: a novel algorithm for multiple sequence alignment.

T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nuc Acids Research

TF: Pattern-Induced Multi-sequence Alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for comparitive protein modelling. Protein Engineering

http://doaj.org/search?source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22id%22%3A%22e7692c4c7356467db164ed6151c3251e%22%7D%7D%5D%7D%7D%7D

Automatic detection of anchor points for multiple sequence alignment

Abstract

Similar works

Full text

Available Versions

Springer - Publisher Connector

Crossref

Directory of Open Access Journals