4 research outputs found

    Sequence similarity is more relevant than species specificity in probabilistic backtranslation

    Get PDF
    BACKGROUND: Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. RESULTS: This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. CONCLUSION: The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically

    Back-translation for discovering distant protein homologies in the presence of frameshift mutations

    Get PDF
    Background: Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins ’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. \ud \ud Results: We developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at http://bioinfo.lifl.fr/path/.\ud \ud Conclusions: Our approach allows to uncover evolutionary information that is not captured by traditional\ud alignment methods, which is confirmed by biologically significant example

    Locally Sensitive Backtranslation Based On Multiple Sequence Alignment

    No full text
    Backtranslation is the process of decoding an amino acid sequence into a corresponding nucleic acid. Classical methods are based on the construction of a codon usage table by clustering and detection of the most probable codon used for each amino acid. In this paper we present a new method for backtranslation which is sensitive to the local position of the amino acid in the input sequence. The method makes use of multiple sequence alignment of the set of proteins under analysis. A local codon usage table stores for each amino acid X and for each position of X in the alignment the most used codon. We compared our method with EMBOSS using both ClustalW and AntiClustAl for multiple sequence alignment. Experiments showed that our method outperforms EMBOSS in terms of precision of backtranslation: the matching between the proteins obtained by our method and the original protein templates is clearly superior to that obtained by EMBOSS. This enforces the validity of a locally sensitive approach
    corecore