4 research outputs found
Sequence similarity is more relevant than species specificity in probabilistic backtranslation
BACKGROUND: Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. RESULTS: This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. CONCLUSION: The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically
Back-translation for discovering distant protein homologies in the presence of frameshift mutations
Background: Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins ’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. \ud
\ud
Results: We developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at http://bioinfo.lifl.fr/path/.\ud
\ud
Conclusions: Our approach allows to uncover evolutionary information that is not captured by traditional\ud
alignment methods, which is confirmed by biologically significant example
Locally Sensitive Backtranslation Based On Multiple Sequence Alignment
Backtranslation is the process of decoding an amino
acid sequence into a corresponding nucleic acid. Classical methods
are based on the construction of a codon usage table by
clustering and detection of the most probable codon used for
each amino acid. In this paper we present a new method for
backtranslation which is sensitive to the local position of the
amino acid in the input sequence. The method makes use of
multiple sequence alignment of the set of proteins under analysis.
A local codon usage table stores for each amino acid X and
for each position of X in the alignment the most used codon.
We compared our method with EMBOSS using both ClustalW
and AntiClustAl for multiple sequence alignment. Experiments
showed that our method outperforms EMBOSS in terms of
precision of backtranslation: the matching between the proteins
obtained by our method and the original protein templates is
clearly superior to that obtained by EMBOSS. This enforces the
validity of a locally sensitive approach