6 research outputs found

    Genetic Algorithms for the Imitation of Genomic Styles in Protein Backtranslation

    Get PDF
    Several technological applications require the translation of a protein into a nucleic acid that codes for it (``backtranslation''). The degeneracy of the genetic code makes this translation ambiguous; moreover, not every translation is equally viable. The common answer to this problem is the imitation of the codon usage of the target species. Here we discuss several other features of coding sequences (``coding statistics'') that are relevant for the ``genomic style'' of different species. A genetic algorithm is then used to obtain backtranslations that mimic these styles, by minimizing the difference in the coding statistics. Possible improvements and applications are discussed.Comment: 17 pages, 13 figures. Submitted to Theor. Comp. Scienc

    DegenRev: Degeneracy-Based Full Backtranslation Algorithm for Oligopeptide

    Get PDF
    In order to design microarray oligonucleotides, in the context of new metabolic pathways discovery, it appears that a full backtranslation of oligopeptides is a promising approach. Protein to DNA reverse translation is a time-consuming task that can provide unreasonable quantities of data. This is why most current applications use genetic degenerated code or data mining-based techniques to select the best reverse translation of a short protein sequence called oligopeptide. When the purpose is only to design short oligos it is particularly interesting to have the complete sequences to solve the design problems of enzyme specific oligos for microarrays. In this paper, we revisit existing bioinformatics applications, which bring reverse translation solutions, and we present a new algorithm based on input oligopeptide degeneracy able to efficiently compute a full reverse translation. We propose an implementation with the C programming language and we show its performance statistics on simulated and real biological datasets

    Sequence similarity is more relevant than species specificity in probabilistic backtranslation

    Get PDF
    BACKGROUND: Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. RESULTS: This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. CONCLUSION: The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically

    Conception et analyse des biopuces à ADN en environnements parallèles et distribués

    Get PDF
    Microorganisms represent the largest diversity of the living beings. They play a crucial rôle in all biological processes related to their huge metabolic potentialities and their capacity for adaptation to different ecological niches. The development of new genomic approaches allows a better knowledge of the microbial communities involved in complex environments functioning. In this context, DNA microarrays represent high-throughput tools able to study the presence, or the expression levels of several thousands of genes, combining qualitative and quantitative aspects in only one experiment. However, the design and analysis of DNA microarrays, with their current high density formats as well as the huge amount of data to process, are complex but crucial steps. To improve the quality and performance of these two steps, we have proposed new bioinformatics approaches for the design and analysis of DNA microarrays in parallel and distributed environments. These multipurpose approaches use high performance computing (HPC) and new software engineering approaches, especially model driven engineering (MDE), to overcome the current limitations. We have first developed PhylGrid 2.0, a new distributed approach for the selection of explorative probes for phylogenetic DNA microarrays at large scale using computing grids. This software was used to build PhylOPDb: a comprehensive 16S rRNA oligonucleotide probe database for prokaryotic identification. MetaExploArrays, which is a parallel software of oligonucleotide probe selection on different computing architectures (a PC, a multiprocessor, a cluster or a computing grid) using meta-programming and a model driven engineering approach, has been developed to improve flexibility in accordance to user’s informatics resources. Then, PhylInterpret, a new software for the analysis of hybridization results of DNA microarrays. PhylInterpret uses the concepts of propositional logic to determine the prokaryotic composition of metagenomic samples. Finally, a new parallelization method based on model driven engineering (MDE) has been proposed to compute a complete backtranslation of short peptides to select probes for functional microarrays.Les microorganismes constituent la plus grande diversité du monde vivant. Ils jouent un rôle clef dans tous les processus biologiques grâce à leurs capacités d’adaptation et à la diversité de leurs capacités métaboliques. Le développement de nouvelles approches de génomique permet de mieux explorer les populations microbiennes. Dans ce contexte, les biopuces à ADN représentent un outil à haut débit de choix pour l'étude de plusieurs milliers d’espèces en une seule expérience. Cependant, la conception et l’analyse des biopuces à ADN, avec leurs formats de haute densité actuels ainsi que l’immense quantité de données à traiter, représentent des étapes complexes mais cruciales. Pour améliorer la qualité et la performance de ces deux étapes, nous avons proposé de nouvelles approches bioinformatiques pour la conception et l’analyse des biopuces à ADN en environnements parallèles. Ces approches généralistes et polyvalentes utilisent le calcul haute performance (HPC) et les nouvelles approches du génie logiciel inspirées de la modélisation, notamment l’ingénierie dirigée par les modèles (IDM) pour contourner les limites actuelles. Nous avons développé PhylGrid 2.0, une nouvelle approche distribuée sur grilles de calcul pour la sélection de sondes exploratoires pour biopuces phylogénétiques. Ce logiciel a alors été utilisé pour construire PhylOPDb: une base de données complète de sondes oligonucléotidiques pour l’étude des communautés procaryotiques. MetaExploArrays qui est un logiciel parallèle pour la détermination de sondes sur différentes architectures de calcul (un PC, un multiprocesseur, un cluster ou une grille de calcul), en utilisant une approche de méta-programmation et d’ingénierie dirigée par les modèles a alors été conçu pour apporter une flexibilité aux utilisateurs en fonction de leurs ressources matériel. PhylInterpret, quant à lui est un nouveau logiciel pour faciliter l’analyse des résultats d’hybridation des biopuces à ADN. PhylInterpret utilise les notions de la logique propositionnelle pour déterminer la composition en procaryotes d’échantillons métagénomiques. Enfin, une démarche d’ingénierie dirigée par les modèles pour la parallélisation de la traduction inverse d’oligopeptides pour le design des biopuces à ADN fonctionnelles a également été mise en place

    Pilot study for subgroup classification for autism spectrum disorder based on dysmorphology and physical measurements in Chinese children

    Get PDF
    Poster Sessions: 157 - Comorbid Medical Conditions: abstract 157.058 58BACKGROUND: Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disorder affecting individuals along a continuum of severity in communication, social interaction and behaviour. The impact of ASD significantly varies amongst individuals, and the cause of ASD can originate broadly between genetic and environmental factors. Objectives: Previous ASD researches indicate that early identification combined with a targeted treatment plan involving behavioural interventions and multidisciplinary therapies can provide substantial improvement for ASD patients. Currently there is no cure for ASD, and the clinical variability and uncertainty of the disorder still remains. Hence, the search to unravel heterogeneity within ASD by subgroup classification may provide clinicians with a better understanding of ASD and to work towards a more definitive course of action. METHODS: In this study, a norm of physical measurements including height, weight, head circumference, ear length, outer and inner canthi, interpupillary distance, philtrum, hand and foot length was collected from 658 Typical Developing (TD) Chinese children aged 1 to 7 years (mean age of 4.19 years). The norm collected was compared against 80 ASD Chinese children aged 1 to 12 years (mean age of 4.36 years). We then further attempted to find subgroups within ASD based on identifying physical abnormalities; individuals were classified as (non) dysmorphic with the Autism Dysmorphology Measure (ADM) from physical examinations of 12 body regions. RESULTS: Our results show that there were significant differences between ASD and TD children for measurements in: head circumference (p=0.009), outer (p=0.021) and inner (p=0.021) canthus, philtrum length (p=0.003), right (p=0.023) and left (p=0.20) foot length. Within the 80 ASD patients, 37(46%) were classified as dysmorphic (p=0.00). CONCLUSIONS: This study attempts to identify subgroups within ASD based on physical measurements and dysmorphology examinations. The information from this study seeks to benefit ASD community by identifying possible subtypes of ASD in Chinese population; in seek for a more definitive diagnosis, referral and treatment plan.published_or_final_versio
    corecore