Search CORE

23,807 research outputs found

Identification of protein coding genes in genomes with statistical functions based on the circular code

Author: Arquès
Arquès
Arquès
Arquès
Arquès
Berstel
Blaisdell
Borodovsky
Burge
Burset
Béal
Christian J Michel
Crick
Crick
Didier G Arquès
Eigen
Fickett
Jérôme Lacan
Karlin
Krogh
Lukashin
Nirenberg
Pavy
Salzberg
Shepherd
Shmatkov
Shulman
Smith
Staden
Staden
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

A new statistical approach using functions based on the circular code classifies correctly more than 93 % of bases in protein (coding) genes and non-coding genes of human sequences. Based on this statistical study, a research software called "Analysis of Coding Genes" (ACG) has been developed for identifying protein genes in the genomes and for determining their frame. Furthermore, the software ACG also allows an evaluation of the length of protein genes, their position in the genome, their relative position between themselves, and the prediction of internal frames in protein genes

CiteSeerX

Crossref

Open Archive Toulouse Archive Ouverte

Analysis of a circular code model

Author: Lacan Jérôme
Michel Christian
Publication venue: 'Elsevier BV'
Publication date: 21/11/2001
Field of study

A circular code has been identified in the protein (coding) genes of both eukaryotes and prokaryotes by using a statistical method called Trinucleotide Frequency method (TF method) [Arquès & Michel, (1996) J.Theor. Biol. 182, 45-58]. Recently, a probabilistic model based on the nucleotide frequencies with a hypothesis of absence of correlation between successive bases on a DNA strand, has been proposed by Koch & Lehmann [(1997) J.Theor. Biol. 189, 171-174] for constructing some particular circular codes. Their interesting method which we call here Nucleotide Frequency method (NF method), reveals several limits for constructing the circular code observed with protein genes

Open Archive Toulouse Archive Ouverte

A Realistic Model under which the Genetic Code is Optimal

Author: Buhrman Harry
Klau Gunnar W.
Schaffner Christian
Speijer Dave
Stougie Leen
van der Gulik Peter T. S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The genetic code has a high level of error robustness. Using values of hydrophobicity scales as a proxy for amino acid character, and the Mean Square measure as a function quantifying error robustness, a value can be obtained for a genetic code which reflects the error robustness of that code. By comparing this value with a distribution of values belonging to codes generated by random permutations of amino acid assignments, the level of error robustness of a genetic code can be quantified. We present a calculation in which the standard genetic code is shown to be optimal. We obtain this result by (1) using recently updated values of polar requirement as input; (2) fixing seven assignments (Ile, Trp, His, Phe, Tyr, Arg, and Leu) based on aptamer considerations; and (3) using known biosynthetic relations of the 20 amino acids. This last point is reflected in an approach of subdivision (restricting the random reallocation of assignments to amino acid subgroups, the set of 20 being divided in four such subgroups). The three approaches to explain robustness of the code (specific selection for robustness, amino acid-RNA interactions leading to assignments, or a slow growth process of assignment patterns) are reexamined in light of our findings. We offer a comprehensive hypothesis, stressing the importance of biosynthetic relations, with the code evolving from an early stage with just glycine and alanine, via intermediate stages, towards 64 codons carrying todays meaning.Comment: 22 pages, 3 figures, 4 tables Journal of Molecular Evolution, July 201

arXiv.org e-Print Archive

CiteSeerX

VU Research Portal

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Mining frequent biological sequences based on bitmap without candidate sequence generation

Author: Davis Darryl N.
Ren Jiadong
Wang Qian
Publication venue: 'Elsevier BV'
Publication date: 30/12/2015
Field of study

Biological sequences carry a lot of important genetic information of organisms. Furthermore, there is an inheritance law related to protein function and structure which is useful for applications such as disease prediction. Frequent sequence mining is a core technique for association rule discovery, but existing algorithms suffer from low efficiency or poor error rate because biological sequences differ from general sequences with more characteristics. In this paper, an algorithm for mining Frequent Biological Sequence based on Bitmap, FBSB, is proposed. FBSB uses bitmaps as the simple data structure and transforms each row into a quicksort list QS-list for sequence growth. For the continuity and accuracy requirement of biological sequence mining, tested sequences used during the mining process of FBSB are real ones instead of generated candidates, and all the frequent sequences can be mined without any errors. Comparing with other algorithms, the experimental results show that FBSB can achieve a better performance on both run time and scalability

Repository@Hull - Worktribe

Optimality of the genetic code with respect to protein stability and amino acid frequencies

Author: Cerf Nicolas
Gilis Dimitri
Massar Serge
Rooman Marianne
Publication venue
Publication date: 01/01/2001
Field of study

How robust is the natural genetic code with respect to mistranslation errors? It has long been known that the genetic code is very efficient in limiting the effect of point mutation. A misread codon will commonly code either for the same amino acid or for a similar one in terms of its biochemical properties, so the structure and function of the coded protein remain relatively unaltered. Previous studies have attempted to address this question more quantitatively, namely by statistically estimating the fraction of randomly generated codes that do better than the genetic code regarding its overall robustness. In this paper, we extend these results by investigating the role of amino acid frequencies in the optimality of the genetic code. When measuring the relative fitness of the natural code with respect to a random code, it is indeed natural to assume that a translation error affecting a frequent amino acid is less favorable than that of a rare one, at equal mutation cost. We find that taking the amino acid frequency into account accordingly decreases the fraction of random codes that beat the natural code, making the latter comparatively even more robust. This effect is particularly pronounced when more refined measures of the amino acid substitution cost are used than hydrophobicity. To show this, we devise a new cost function by evaluating with computer experiments the change in folding free energy caused by all possible single-site mutations in a set of known protein structures. With this cost function, we estimate that of the order of one random code out of 100 millions is more fit than the natural code when taking amino acid frequencies into account. The genetic code seems therefore structured so as to minimize the consequences of translation errors on the 3D structure and stability of proteins.Comment: 31 pages, 2 figures, postscript fil

arXiv.org e-Print Archive

PubMed Central

DI-fusion

Enumerating Designing Sequences in the HP Model

Author: Irbäck Anders
Troein Carl
Publication venue
Publication date: 01/01/2001
Field of study

The hydrophobic/polar HP model on the square lattice has been widely used to investigate basics of protein folding. In the cases where all designing sequences (sequences with unique ground states) were enumerated without restrictions on the number of contacts, the upper limit on the chain length N has been 18-20 because of the rapid exponential growth of the numbers of conformations and sequences. We show how a few optimizations push this limit by about 5 units. Based on these calculations, we study the statistical distribution of hydrophobicity along designing sequences. We find that the average number of hydrophobic and polar clumps along the chains is larger for designing sequences than for random ones, which is in agreement with earlier findings for N up to 18 and with results for real enzymes. We also show that this deviation from randomness disappears if the calculations are restricted to maximally compact structures.Comment: 18 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Lund University Publications

PubMed Central