20,462 research outputs found
Translation conditional models for protein coding sequences
A coding sequence is defined as a DNA sequence coding the primary structure of a protein (a polypeptide). Such a sequence must satisfy a specific constraint, which consists in coding a functional protein, As the genetic code is degenerated, there exists, for a given polypeptide, a set of synonymous sequences which would code the same polypeptide, Translation conditional models are being defined on such sets. The aim of this paper is to give a common formalism, Besides the codon bias model, a few other conditional models will be defined. Statistical estimators and comparison methods will be briefly presented. These models can be used for gene classification, or to find out, in a real sequence, remarkable features. An example will be presented on Escherichia coli genes
Metabolic constraints on the evolution of genetic codes: Did multiple 'preaerobic' ecosystem transitions entrain richer dialects via Serial Endosymbiosis?
A mathematical model based on Tlusty's topological deconstruction suggests that multiple punctuated ecosystem shifts in available metabolic free energy, broadly akin to the 'aerobic' transition, enabled a punctuated sequence of increasingly complex genetic codes and protein translators under mechanisms similar to the Serial Endosymbiosis effecting the Eukaryotic transition. These evolved until the ancestor to the present narrow spectrum of nearly maximally robust codes became locked-in by path dependence
A Rate Distortion approach to protein symmetry
A spontaneous symmetry breaking argument is applied to the problem of protein form, via a Rate Distortion analysis of the relation between genome coding and the final condensation of the protein 'molten globule'. The Rate Distortion Function, under coding constraints, serves as a temperature analog, so that low values act to drive proteins to simple symmetries. The Rate Distortion Function itself is significantly constrained by the availability of metabolic free energy. This work extends Tlusty's (2007) elegant exploration of the evolution of the genetic code, suggesting that rate distortion considerations may play a critical role across a broad spectrum of molecular expressions of evolutionary process
The glucocorticoid receptor in inflammatory processes : transrepression is not enough
Glucocorticoids (GCs) are the most commonly used anti-inflammatory agents to treat inflammatory and immune diseases. However, steroid therapies are accompanied by severe side-effects during long-term treatment. The dogma that transrepression of genes, by tethering of the glucocorticoid receptor (GR) to DNA-bound pro-inflammatory transcription factors, is the main anti-inflammatory mechanism, is now challenged. Recent discoveries using conditional GR mutant mice and genomic approaches reveal that transactivation of anti-inflammatory acting genes is essential to suppress many inflammatory disease models. This novel view radically changes the concept to design selective acting GR ligands with a reduced side-effect profile
Codon Bias Patterns of 's Interacting Proteins
Synonymous codons, i.e., DNA nucleotide triplets coding for the same amino
acid, are used differently across the variety of living organisms. The
biological meaning of this phenomenon, known as codon usage bias, is still
controversial. In order to shed light on this point, we propose a new codon
bias index, , that is based on the competition between cognate and
near-cognate tRNAs during translation, without being tuned to the usage bias of
highly expressed genes. We perform a genome-wide evaluation of codon bias for
, comparing with other widely used indices: , , and
. We show that and capture similar information by being
positively correlated with gene conservation, measured by ERI, and
essentiality, whereas, and appear to be less sensitive to
evolutionary-functional parameters. Notably, the rate of variation of and
with ERI allows to obtain sets of genes that consistently belong to
specific clusters of orthologous genes (COGs). We also investigate the
correlation of codon bias at the genomic level with the network features of
protein-protein interactions in . We find that the most densely
connected communities of the network share a similar level of codon bias (as
measured by and ). Conversely, a small difference in codon bias
between two genes is, statistically, a prerequisite for the corresponding
proteins to interact. Importantly, among all codon bias indices, turns
out to have the most coherent distribution over the communities of the
interactome, pointing to the significance of competition among cognate and
near-cognate tRNAs for explaining codon usage adaptation
Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
The article presents an application of Hidden Markov Models (HMMs) for
pattern recognition on genome sequences. We apply HMM for identifying genes
encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma
brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa
causative agents of sleeping sickness and several diseases in domestic and wild
animals. These parasites have a peculiar strategy to evade the host's immune
system that consists in periodically changing their predominant cellular
surface protein (VSG). The motivation for using patterns recognition methods to
identify these genes, instead of traditional homology based ones, is that the
levels of sequence identity (amino acid and DNA sequence) amongst these genes
is often below of what is considered reliable in these methods. Among pattern
recognition approaches, HMM are particularly suitable to tackle this problem
because they can handle more naturally the determination of gene edges. We
evaluate the performance of the model using different number of states in the
Markov model, as well as several performance metrics. The model is applied
using public genomic data. Our empirical results show that the VSG genes on T.
brucei can be safely identified (high sensitivity and low rate of false
positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications,
Springer. The article contains 23 pages, 4 figures, 8 tables and 51
reference
Bacterial riboproteogenomics : the era of N-terminal proteoform existence revealed
With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome re-annotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms
Maximum entropy models capture melodic styles
We introduce a Maximum Entropy model able to capture the statistics of
melodies in music. The model can be used to generate new melodies that emulate
the style of the musical corpus which was used to train it. Instead of using
the body interactions of order Markov models, traditionally used in
automatic music generation, we use a nearest neighbour model with pairwise
interactions only. In that way, we keep the number of parameters low and avoid
over-fitting problems typical of Markov models. We show that long-range musical
phrases don't need to be explicitly enforced using high-order Markov
interactions, but can instead emerge from multiple, competing, pairwise
interactions. We validate our Maximum Entropy model by contrasting how much the
generated sequences capture the style of the original corpus without
plagiarizing it. To this end we use a data-compression approach to discriminate
the levels of borrowing and innovation featured by the artificial sequences.
The results show that our modelling scheme outperforms both fixed-order and
variable-order Markov models. This shows that, despite being based only on
pairwise interactions, this Maximum Entropy scheme opens the possibility to
generate musically sensible alterations of the original phrases, providing a
way to generate innovation
- …