DNA^+Pro^: an Improved Progressive Multiple Sequence Alignment Algorithm for Evolutionary Analysis Using Combined DNA-Protein Sequences

Abstract

Alignment of DNA and protein sequences is a basic tool in the study of evolutionary, structural and functional relationship among macromolecules. Present sequence alignment methods are somewhat error-prone, often producing systematic bias. Errors in sequence alignments sometimes lead to subsequent misinterpretation of evolutionary, structural and functional information in genes, proteins and genomes. In traditional sequence alignment algorithms, alignments of DNA and protein sequences are conducted separately. It has been long believed that the phylogenetic signal disappears more rapidly from DNA sequences than from encoded proteins. It is therefore generally preferable to align sequences at the amino acid level. Here we present a new method—DNA^+Pro^, which aggregates DNA and protein sequences into combined DNA-protein sequences and align them in a combined fashion. We demonstrate that combining sequences improve the quality of multiple sequence alignment and solve practical evolutionary problems in primate immunodeficiency virus proteins and bacterial restriction enzymes. In addition to increased theoretical information contents, the distance estimations are more biological significant in combined alignment than in protein only or DNA only alignments. By integrating information buried separately in DNA and protein sequences, DNA^+Pro^ improves the accuracy of multiple sequence alignment of closely-related proteins and prevents certain errors that may occur in phylogeny analysis using protein only approaches. The DNA^+Pro^ software and the supplementary data are downloadable free of charge from "our website, http://www.dnapluspro.com":http://www.dnapluspro.com.
&#xa

    Similar works

    Full text

    thumbnail-image

    Available Versions