5 research outputs found

    Entropy of never born protein sequences

    Get PDF
    BACKGROUND: A Never Born protein is a theoretical protein which does not occur in nature. The reason why some proteins were selected and some were not during evolution is not known. We applied information theory to find similarities and differences in information content in Never Born and natural proteins. FINDINGS: Both block and relative entropies are similar what means that both protein kinds contain strongly random sequences. An artificially generated Never Born protein sequence is closely as random as a natural one. CONCLUSIONS: Information theory approach suggests that protein selection during evolution was rather random/non-deterministic. Natural proteins have no noticeable unique features in information theory sense. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/2193-1801-2-200) contains supplementary material, which is available to authorized users

    Massive non-natural proteins structure prediction using grid technologies

    Get PDF
    Background The number of natural proteins represents a small fraction of all the possible protein sequences and there is an enormous number of pr oteins never sampled by nature, the so called "never born proteins" (NBPs). A fundamental question in this regard is if the ensemble of natural proteins possesses peculiar chemical and physical properties or if it is just the product of contingency coupled to functional selection. A key feature of natural proteins is thei r ability to form a well defined three-dimensional structure. T hus, the structural study of NBPs can help to understand if natural protein sequences were selecte d for their peculiar properties or if they are just one of the possible stable and functional ensembles. Methods The structural characterization of a huge number of random proteins cannot be approached experimentally, thus the problem has been tackled using a computational approach. A large random protein sequences library (2 × 10 ^4 sequences) was generated, discarding amino acid sequences with significant simi larity to natural proteins, and the corresponding structures were predicted using Rosetta. Given th e highly computational demanding problem, Rosetta was ported in grid and a user friendly job submission environment was developed within the GENIUS Grid Portal. Protein structures generated were analysed in terms of net charge, secondary structure content, surface/volume ratio, hydrophobic core composition, etc. Results The vast majority of NBPs, according to the Rosetta mode l, are characterized by a compact three-dimensional structure with a high secondary structure content. Structure compactness and surface polarity are comparable to those of natural proteins, suggesting similar stability and solubility. Deviations are observed in α helix- ÎČ strands relative content and inydrophobic core composition, as NBPs appear to be richer in helical structure and aromatic amino acids with respect to natural proteins. Conclusion The results obtained suggest that the abil ity to form a compact, ordered and water-soluble structure is an intrinsic property of polypeptides. The tendency of random sequences to adopt α helical folds indicate that all-α proteins may have emerged ea rly in pre-biotic evolution. Further, the lower percentage of aromatic residu es observed in natural proteins has important evolutionary implications as far as tolerance to mutati ons is concerned

    Do Natural Proteins Differ from Random Sequences Polypeptides? Natural vs. Random Proteins Classification Using an Evolutionary Neural Network

    Get PDF
    Are extant proteins the exquisite result of natural selection or are they random sequences slightly edited by evolution? This question has puzzled biochemists for long time and several groups have addressed this issue comparing natural protein sequences to completely random ones coming to contradicting conclusions. Previous works in literature focused on the analysis of primary structure in an attempt to identify possible signature of evolutionary editing. Conversely, in this work we compare a set of 762 natural proteins with an average length of 70 amino acids and an equal number of completely random ones of comparable length on the basis of their structural features. We use an ad hoc Evolutionary Neural Network Algorithm (ENNA) in order to assess whether and to what extent natural proteins are edited from random polypeptides employing 11 different structure-related variables (i.e. net charge, volume, surface area, coil, alpha helix, beta sheet, percentage of coil, percentage of alpha helix, percentage of beta sheet, percentage of secondary structure and surface hydrophobicity). The ENNA algorithm is capable to correctly distinguish natural proteins from random ones with an accuracy of 94.36%. Furthermore, we study the structural features of 32 random polypeptides misclassified as natural ones to unveil any structural similarity to natural proteins. Results show that random proteins misclassified by the ENNA algorithm exhibit a significant fold similarity to portions or subdomains of extant proteins at atomic resolution. Altogether, our results suggest that natural proteins are significantly edited from random polypeptides and evolutionary editing can be readily detected analyzing structural features. Furthermore, we also show that the ENNA, employing simple structural descriptors, can predict whether a protein chain is natural or random

    RANDOMBLAST A TOOL TO GENERATE RANDOM “NEVER BORN PROTEIN” SEQUENCES

    No full text
    In an accompanying paper by Minervini et al., we deal with the scientific problem of studying the sequence to structure relationships in “never born proteins” (NBPs), i.e. protein sequences which have never been observed in nature. The study of the structural and functional properties of "never born proteins" requires the generation of a large library of protein sequences characterized by the absence of any significant similarity with all the known protein sequences. In this paper we describe the implementation of a simple command-line software utility used to generate random amino acid sequences and to filter them against the NCBI non redundant protein database, using as a threshold the value of the Evalue parameter returned by the well known sequence comparison software Blast. This utility, named RandomBlast, has been written using C programming language for Windows operating systems. The structural implications of NBPs random amino acid composition are discussed as compared to natural proteins of comparable length
    corecore