Search CORE

2,427 research outputs found

Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences

Author: Gatherer D.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2007
Field of study

A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively contextindependent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time

Directory of Open Access Journals

Enlighten

Lancaster E-Prints

Evolution of foot-and-mouth disease virus intra-sample sequence diversity during serial transmission in bovine hosts

Author: Haydon D.T.
Juleff N.
King D.P.
Knowles N.J.
Morelli M.J.
Paton D.J.
Wright C.F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

RNA virus populations within samples are highly heterogeneous, containing a large number of minority sequence variants which can potentially be transmitted to other susceptible hosts. Consequently, consensus genome sequences provide an incomplete picture of the within- and between-host viral evolutionary dynamics during transmission. Foot-and-mouth disease virus (FMDV) is an RNA virus that can spread from primary sites of replication, via the systemic circulation, to found distinct sites of local infection at epithelial surfaces. Viral evolution in these different tissues occurs independently, each of them potentially providing a source of virus to seed subsequent transmission events. This study employed the Illumina Genome Analyzer platform to sequence 18 FMDV samples collected from a chain of sequentially infected cattle. These data generated snap-shots of the evolving viral population structures within different animals and tissues. Analyses of the mutation spectra revealed polymorphisms at frequencies >0.5% at between 21 and 146 sites across the genome for these samples, while 13 sites acquired mutations in excess of consensus frequency (50%). Analysis of polymorphism frequency revealed that a number of minority variants were transmitted during host-to-host infection events, while the size of the intra-host founder populations appeared to be smaller. These data indicate that viral population complexity is influenced by small intra-host bottlenecks and relatively large inter-host bottlenecks. The dynamics of minority variants are consistent with the actions of genetic drift rather than strong selection. These results provide novel insights into the evolution of FMDV that can be applied to reconstruct both intra- and inter-host transmission routes

Crossref

Springer - Publisher Connector

Enlighten

From in vitro evolution to protein structure

Author: Fantini Marco
Publication venue: 'Scuola Normale Superiore - Edizioni della Normale'
Publication date: 29/05/2020
Field of study

In the nanoscale, the machinery of life is mainly composed by macromolecules and macromolecular complexes that through their shapes create a network of interconnected mechanisms of biological processes. The relationship between shape and function of a biological molecule is the foundation of structural biology, that aims at studying the structure of a protein or a macromolecular complex to unveil the molecular mechanism through which it exerts its function. What about the reverse: is it possible by exploiting the function for which a protein was naturally selected to deduce the protein structure? To this aim we developed a method, called CAMELS (Coupling Analysis by Molecular Evolution Library Sequencing), able to obtain the structural features of a protein from an artificial selection based on that protein function. With CAMELS we tried to reconstruct the TEM-1 beta lactamase fold exclusively by generating and sequencing large libraries of mutational variants. Theoretically with this method it is possible to reconstruct the structure of a protein regardless of the species of origin or the phylogenetical time of emergence when a functional phenotypic selection of a protein is available. CAMELS allows us to obtain protein structures without needing to purify the protein beforehand

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Codon Bias Patterns of $E.coli$ 's Interacting Proteins

Author: Cimini Giulio
Deiana Antonio
Dilucca Maddalena
Giansanti Andrea
Semmoloni Andrea
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Synonymous codons, i.e., DNA nucleotide triplets coding for the same amino acid, are used differently across the variety of living organisms. The biological meaning of this phenomenon, known as codon usage bias, is still controversial. In order to shed light on this point, we propose a new codon bias index,

CompAI

, that is based on the competition between cognate and near-cognate tRNAs during translation, without being tuned to the usage bias of highly expressed genes. We perform a genome-wide evaluation of codon bias for

E.coli

, comparing

CompAI

with other widely used indices:

tAI

CAI

, and

Nc

. We show that

CompAI

and

tAI

capture similar information by being positively correlated with gene conservation, measured by ERI, and essentiality, whereas,

CAI

and

Nc

appear to be less sensitive to evolutionary-functional parameters. Notably, the rate of variation of

tAI

and

CompAI

with ERI allows to obtain sets of genes that consistently belong to specific clusters of orthologous genes (COGs). We also investigate the correlation of codon bias at the genomic level with the network features of protein-protein interactions in

E.coli

. We find that the most densely connected communities of the network share a similar level of codon bias (as measured by

CompAI

and

tAI

). Conversely, a small difference in codon bias between two genes is, statistically, a prerequisite for the corresponding proteins to interact. Importantly, among all codon bias indices,

CompAI

turns out to have the most coherent distribution over the communities of the interactome, pointing to the significance of competition among cognate and near-cognate tRNAs for explaining codon usage adaptation

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Archivio della ricerca della Scuola IMT Alti Studi Lucca

ART

Archivio della ricerca- Università di Roma La Sapienza

IMT Institutional Repository

FigShare

Statistical Complexity Analysis of Turing Machine tapes with Fixed Algorithmic Complexity Using the Best-Order Markov Model

Author: Matos Sergio
Pinho Eduardo
Pratas Diogo
Silva Jorge M.
Publication venue
Publication date: 01/01/2020
Field of study

Sources that generate symbolic sequences with algorithmic nature may differ in statistical complexity because they create structures that follow algorithmic schemes, rather than generating symbols from a probabilistic function assuming independence. In the case of Turing machines, this means that machines with the same algorithmic complexity can create tapes with different statistical complexity. In this paper, we use a compression-based approach to measure global and local statistical complexity of specific Turing machine tapes with the same number of states and alphabet. Both measures are estimated using the best-order Markov model. For the global measure, we use the Normalized Compression (NC), while, for the local measures, we define and use normal and dynamic complexity profiles to quantify and localize lower and higher regions of statistical complexity. We assessed the validity of our methodology on synthetic and real genomic data showing that it is tolerant to increasing rates of editions and block permutations. Regarding the analysis of the tapes, we localize patterns of higher statistical complexity in two regions, for a different number of machine states. We show that these patterns are generated by a decrease of the tape's amplitude, given the setting of small rule cycles. Additionally, we performed a comparison with a measure that uses both algorithmic and statistical approaches (BDM) for analysis of the tapes. Naturally, BDM is efficient given the algorithmic nature of the tapes. However, for a higher number of states, BDM is progressively approximated by our methodology. Finally, we provide a simple algorithm to increase the statistical complexity of a Turing machine tape while retaining the same algorithmic complexity. We supply a publicly available implementation of the algorithm in C++ language under the GPLv3 license. All results can be reproduced in full with scripts provided at the repository.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

MicroRNA Target Detection and Analysis for Genes Related to Breast Cancer Using MDLcompress

Author: Conklin Douglas S
Evans Scott C
Kourtidis Antonis
Markham T Stephen
Miller Jonathan
Torres Andrew S
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Pervasive, conserved secondary structure in highly charged protein regions

Author: Dinner Aaron R.
Drummond D. Allan
Pan Rosalind Wenshan
Triandafillou Catherine G.
Publication venue
Publication date: 27/10/2023
Field of study

Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder—high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length—are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches

Knowledge UChicago