20,462 research outputs found

    Translation conditional models for protein coding sequences

    Get PDF
    A coding sequence is defined as a DNA sequence coding the primary structure of a protein (a polypeptide). Such a sequence must satisfy a specific constraint, which consists in coding a functional protein, As the genetic code is degenerated, there exists, for a given polypeptide, a set of synonymous sequences which would code the same polypeptide, Translation conditional models are being defined on such sets. The aim of this paper is to give a common formalism, Besides the codon bias model, a few other conditional models will be defined. Statistical estimators and comparison methods will be briefly presented. These models can be used for gene classification, or to find out, in a real sequence, remarkable features. An example will be presented on Escherichia coli genes

    Metabolic constraints on the evolution of genetic codes: Did multiple 'preaerobic' ecosystem transitions entrain richer dialects via Serial Endosymbiosis?

    Get PDF
    A mathematical model based on Tlusty's topological deconstruction suggests that multiple punctuated ecosystem shifts in available metabolic free energy, broadly akin to the 'aerobic' transition, enabled a punctuated sequence of increasingly complex genetic codes and protein translators under mechanisms similar to the Serial Endosymbiosis effecting the Eukaryotic transition. These evolved until the ancestor to the present narrow spectrum of nearly maximally robust codes became locked-in by path dependence

    A Rate Distortion approach to protein symmetry

    Get PDF
    A spontaneous symmetry breaking argument is applied to the problem of protein form, via a Rate Distortion analysis of the relation between genome coding and the final condensation of the protein 'molten globule'. The Rate Distortion Function, under coding constraints, serves as a temperature analog, so that low values act to drive proteins to simple symmetries. The Rate Distortion Function itself is significantly constrained by the availability of metabolic free energy. This work extends Tlusty's (2007) elegant exploration of the evolution of the genetic code, suggesting that rate distortion considerations may play a critical role across a broad spectrum of molecular expressions of evolutionary process

    The glucocorticoid receptor in inflammatory processes : transrepression is not enough

    Get PDF
    Glucocorticoids (GCs) are the most commonly used anti-inflammatory agents to treat inflammatory and immune diseases. However, steroid therapies are accompanied by severe side-effects during long-term treatment. The dogma that transrepression of genes, by tethering of the glucocorticoid receptor (GR) to DNA-bound pro-inflammatory transcription factors, is the main anti-inflammatory mechanism, is now challenged. Recent discoveries using conditional GR mutant mice and genomic approaches reveal that transactivation of anti-inflammatory acting genes is essential to suppress many inflammatory disease models. This novel view radically changes the concept to design selective acting GR ligands with a reduced side-effect profile

    Codon Bias Patterns of E.coliE.coli's Interacting Proteins

    Get PDF
    Synonymous codons, i.e., DNA nucleotide triplets coding for the same amino acid, are used differently across the variety of living organisms. The biological meaning of this phenomenon, known as codon usage bias, is still controversial. In order to shed light on this point, we propose a new codon bias index, CompAICompAI, that is based on the competition between cognate and near-cognate tRNAs during translation, without being tuned to the usage bias of highly expressed genes. We perform a genome-wide evaluation of codon bias for E.coliE.coli, comparing CompAICompAI with other widely used indices: tAItAI, CAICAI, and NcNc. We show that CompAICompAI and tAItAI capture similar information by being positively correlated with gene conservation, measured by ERI, and essentiality, whereas, CAICAI and NcNc appear to be less sensitive to evolutionary-functional parameters. Notably, the rate of variation of tAItAI and CompAICompAI with ERI allows to obtain sets of genes that consistently belong to specific clusters of orthologous genes (COGs). We also investigate the correlation of codon bias at the genomic level with the network features of protein-protein interactions in E.coliE.coli. We find that the most densely connected communities of the network share a similar level of codon bias (as measured by CompAICompAI and tAItAI). Conversely, a small difference in codon bias between two genes is, statistically, a prerequisite for the corresponding proteins to interact. Importantly, among all codon bias indices, CompAICompAI turns out to have the most coherent distribution over the communities of the interactome, pointing to the significance of competition among cognate and near-cognate tRNAs for explaining codon usage adaptation

    Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome

    Full text link
    The article presents an application of Hidden Markov Models (HMMs) for pattern recognition on genome sequences. We apply HMM for identifying genes encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa causative agents of sleeping sickness and several diseases in domestic and wild animals. These parasites have a peculiar strategy to evade the host's immune system that consists in periodically changing their predominant cellular surface protein (VSG). The motivation for using patterns recognition methods to identify these genes, instead of traditional homology based ones, is that the levels of sequence identity (amino acid and DNA sequence) amongst these genes is often below of what is considered reliable in these methods. Among pattern recognition approaches, HMM are particularly suitable to tackle this problem because they can handle more naturally the determination of gene edges. We evaluate the performance of the model using different number of states in the Markov model, as well as several performance metrics. The model is applied using public genomic data. Our empirical results show that the VSG genes on T. brucei can be safely identified (high sensitivity and low rate of false positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications, Springer. The article contains 23 pages, 4 figures, 8 tables and 51 reference

    Bacterial riboproteogenomics : the era of N-terminal proteoform existence revealed

    Get PDF
    With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome re-annotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms

    Maximum entropy models capture melodic styles

    Full text link
    We introduce a Maximum Entropy model able to capture the statistics of melodies in music. The model can be used to generate new melodies that emulate the style of the musical corpus which was used to train it. Instead of using the n−n-body interactions of (n−1)−(n-1)-order Markov models, traditionally used in automatic music generation, we use a k−k-nearest neighbour model with pairwise interactions only. In that way, we keep the number of parameters low and avoid over-fitting problems typical of Markov models. We show that long-range musical phrases don't need to be explicitly enforced using high-order Markov interactions, but can instead emerge from multiple, competing, pairwise interactions. We validate our Maximum Entropy model by contrasting how much the generated sequences capture the style of the original corpus without plagiarizing it. To this end we use a data-compression approach to discriminate the levels of borrowing and innovation featured by the artificial sequences. The results show that our modelling scheme outperforms both fixed-order and variable-order Markov models. This shows that, despite being based only on pairwise interactions, this Maximum Entropy scheme opens the possibility to generate musically sensible alterations of the original phrases, providing a way to generate innovation
    • …
    corecore