Search CORE

7,103 research outputs found

Refining multiple sequence alignments with conserved core regions

Author: Bryant Stephen H.
Chakrabarti Saikat
Lanczycki Christopher J.
Panchenko Anna R.
Przytycka Teresa M.
Thiessen Paul A.
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution () and will be incorporated into the next release of the Cn3D structure/alignment viewer

CiteSeerX

PubMed Central

Genomic multiple sequence alignments: refinement using a genetic algorithm

Author: Chunlin Wang
Elliot J Lefkowitz
Wang Chunlin
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. RESULTS: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps. CONCLUSION: We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

State of the art: refinement of multiple sequence alignments

Author: A Marchler-Bauer
AB Robinson
AJ Jennings
Anna R Panchenko
C Notredame
C Notredame
CB Do
Christopher J Lanczycki
GJ Barton
IM Wallace
J Chen
J Heringa
J Heringa
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JF Gibrat
K Katoh
K Katoh
O Gotoh
Paul A Thiessen
RC Edgar
S Chakrabarti
Saikat Chakrabarti
SR Eddy
Stephen H Bryant
T Lassmann
T Madej
Teresa M Przytycka
WR Taylor
WS Valdar
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Accurate multiple sequence alignments of proteins are very important in computational biology today. Despite the numerous efforts made in this field, all alignment strategies have certain shortcomings resulting in alignments that are not always correct. Refinement of existing alignment can prove to be an intelligent choice considering the increasing importance of high quality alignments in large scale high-throughput analysis. RESULTS: We provide an extensive comparison of the performance of the alignment refinement algorithms. The accuracy and efficiency of the refinement programs are compared using the 3D structure-based alignments in the BAliBASE benchmark database as well as manually curated high quality alignments from Conserved Domain Database (CDD). CONCLUSION: Comparison of performance for refined alignments revealed that despite the absence of dramatic improvements, our refinement method, REFINER, which uses conserved regions as constraints performs better in improving the alignments generated by different alignment algorithms. In most cases REFINER produces a higher-scoring, modestly improved alignment that does not deteriorate the well-conserved regions of the original alignment

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central