Article thumbnail

TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations

By Federico Abascal, Rafael Zardoya and Maximilian J. Telford

Abstract

We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk

Topics: Articles
Publisher: Oxford University Press
OAI identifier: oai:pubmedcentral.nih.gov:2896173
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles

Citations

  1. (1993). A guide to IUPAC Nomenclature of Organic Compounds. Blackwell Scientific Publications,
  2. (2000). EMBOSS:
  3. (2007). Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.
  4. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment.
  5. (1993). Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites.
  6. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
  7. (2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments.
  8. (2006). Parallel evolution of the genetic code in arthropod mitochondrial genomes.
  9. (2003). PROTAL2DNA: align DNA sequences given the corresponding protein alignment. http://www .pasteur.fr/recherche/unites/sis/formation/bioperl/lecture_code/ protal2dna.html (26
  10. (2006). PROTOGENE: turning amino acid alignments into bona fide CDS nucleotide alignments.
  11. (2001). ReadSeq: read & reformat biosequences.
  12. (2003). RevTrans: multiple alignment of coding DNA from aligned amino acid sequences.
  13. (2005). transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.