54 research outputs found
On Circular Coding Properties of Gene and Protein Sequences
The algorithms and equations that define and link circular coding properties of the genetic code and amino acid structural properties are derived according to the model of Cantor dynamics based automaton. It is shown that the model defines a unifying concept of the genetic code, which incorporates Crick\u27s code without comma and the evolutionary code concept. Arithmetic for codes is defined via the number theory results in coding theory, Smale\u27s horseshoe map and the dynamics on fractal lattices. The method has been denoted SCA (Symbolic Cantor Algorithm) and defined with respect to the principles of Molecular Recognition Theory, Grafstein\u27s hypothesis of the stereochemical origin of the genetic code and Siemion\u27s mutation ring. Underlying Fibonacci dynamics is extracted and mathematically defined considering the Cantor set and Farey tree codon and amino acid projections. Two digit specification of the codon positions, by means of the binary group subdivision, is particularly analysed with respect to octal coding and defined according to the purine-pyrimidine, amino-keto and strong-week H bonding discrimination principles
Universal Metric Properties of the Genetic Code
Universal metric properties of the genetic code (i.e. RNA, DNA and protein coding) are defined by means of the nucleotide base representation on the square with vertices U or T = 00, C = 01, G = 10 and A = 11. It is shown that this notation defines the Cantor set and Smale horseshoe map representation of the genetic code, the classic table arrangement and Siemion one-step mutation ring of the code. Gray code Solutions to the problem of defining codon positions on the [0, 1] interval, and an extension to the octal coding system, based on the linear block triple check code, are given. This result enables short block (word) decoding of the genetic code patterns. The block code is related to the minimization of errors during transcription and translation processes, which implies that the genetic code is error-correcting and not degenerate. Two algorithms for the representation of codons on the [0, 1] interval and the related binary trees are discussed. It is concluded that the ternary Cantor set algorithm is the method of choice for this type of analysis and coding. This procedure enables the analysis of the six dimensional hypercube codon positions by means of a simple time series and/or \u27logistic\u27 difference equation. Finally, a unified concept of the genetic code linked to the Cantor set and horseshoe map is introduced in the form of a classic combinatorial 4 colour necklace model with three horizontal frames consisting of 64 coloured pearls (bases) and vertically hanging decorations of triplets (codons). Three horizontal necklace frames define Crick’s code without comma, and vertical necklace decorations define the evolutional code. Thus, the type of the code depends on the level or direction of observation. The exact location of the mRNA and complementary DNA coding groups of triplets within a frame is determined. The latter enables decoding of long code block (language) patterns within the genetic code. This method of genetic code analysis is named Symbolic Cantor Algorithm (SCA). The validity of the method was confirmed by 94% accurate classification of 50 proteins of known secondary structure (25 α-helices and 25 β-sheets) with the C5.0 machine learning sys-tem. Nucleotide strings of proteins transcribed by SCA were used for the analysis. Spectral Fourier analysis of Pro-opiomelanocortin and Bone Morphogenetic Protein 6 confirmed that the method might be also applied to the analysis of bioactive hormone and cytokine sequences
On the Genetic Origin of Complementary Protein Coding
The relations of protein coding and hydropathy are investigated considering the principles of the molecular recognition theory and Grafstein\u27s hypothesis of the stereochemical origin of the genetic code. It is shown that the coding of RNA and DNA requires 14 distinct groups of codon-anticodon pairs, which define all possible complementary amino acids. The molecular recognition theory is redefined considering the codon-anticodon relations of mRNAs, DNAs, tRNAs and Siemion\u27s mutation ring of the genetic code. A model of DNA, RNA and protein coding (and decoding) based on two fundamental properties of DNA/RNA, denoted as complementary and stationary principles, is presented. Stationary DNA/RNA coding defines the nucleotide relationship of the same (self) DNA/RNA strand and complementary coding defines nucleotide distribution related to other (non-self) strand. Combinations of 2 digits, denoting primary and secondary characteristics of each nucleotide, specify codon positions according to the group subdivision (discrimination) principle. The process of coding is related to the hypercube node codon representations and dynamics of their binary tree locations. The relations between binary tree locations and Cantor set representations of different codon points are discussed in the context of quadratic mappings, Feigenbaum dynamics and signal analysis. Combinations of hypercube nodes and different binary tree positions define the words, sentences and syntax of DNA, RNA and protein language. Possible applications of this method may be related to network analysis and the design, gene, protein and drug modelling
Binary Coding, mRNA Information and Protein Structure
We describe new binary algorithm for the prediction of α and β protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of α and β protein folds on a personal computer by means of a few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimization) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43% – 100% and the tenfold cross-validation result of the procedure was 83.57% – >90%.
Genetic code randomization analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p = 2.7 x 10^(-4) that a better code than the natural one specified by the binary coding algorithm is randomly produced, b)dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure
Structural and Functional Modeling of Artificial Bioactive Proteins
A total of 32 synthetic proteins designed by Michael Hecht and co-workers was investigated using standard bioinformatics tools for the structure and function modeling. The dataset consisted of 15 artificial α-proteins (Hecht_α) designed to fold into 102-residue four-helix bundles and 17 artificial six-stranded β-sheet proteins (Hecht_β). We compared the experimentally-determined properties of the sequences investigated with the results of computational methods for protein structure and bioactivity prediction. The conclusion reached is that the dataset of Michael Hecht and co- workers could be successfully used both to test current methods and to develop new ones for the characterization of artificially-designed molecules based on the specific binary patterns of amino acid polarity. The comparative investigations of the bioinformatics methods on the datasets of both de novo proteins and natural ones may lead to: (1) improvement of the existing tools for protein structure and function analysis ; (2) new algorithms for the construction of de novo protein subsets ; and (3) additional information on the complex natural sequence space and its relation to the individual subspaces of de novo sequences. Additional investigations on different and varied datasets are needed to confirm the general applicability of this concept
Antisense Peptide Technology for Diagnostic Tests and Bioengineering Research
Antisense peptide technology (APT) is based on a useful heuristic algorithm for rational peptide design. It was deduced from empirical observations that peptides consisting of complementary (sense and antisense) amino acids interact with higher probability and affinity than the randomly selected ones. This phenomenon is closely related to the structure of the standard genetic code table, and at the same time, is unrelated to the direction of its codon sequence translation. The concept of complementary peptide interaction is discussed, and its possible applications to diagnostic tests and bioengineering research are summarized. Problems and difficulties that may arise using APT are discussed, and possible solutions are proposed. The methodology was tested on the example of SARS-CoV-2. It is shown that the CABS-dock server accurately predicts the binding of antisense peptides to the SARS-CoV-2 receptor binding domain without requiring predefinition of the binding site. It is concluded that the benefits of APT outweigh the costs of random peptide screening and could lead to considerable savings in time and resources, especially if combined with other computational and immunochemical methods
- …