1,497 research outputs found

    Historical Criteria for Structural Classes of Proteins in Percentages: After 20 Years

    Get PDF
    Two decades ago scientists proposed some criteria for the structural classes in percentages. Today experts at SCOP classified hundreds of thousands of proteins into one of the four structural classes manually by inspection, and observation. Nakashima et al. gave a classification criteria. P.Y. Chou also proposed another method to classify proteins according their residue contents in three conformations, helix, sheet, and coil. Later P.Y. Chou revised his method. Today SCOP listed around 100.000 proteins with their structural classes. In this paper two datasets will be used to reveal the percentages of residues in α-Helices, β-sheets, and coils in proteins of classes all α, all β, α+β, and α/β, in the classifications made by experts in SCOP. The first of the data bases is PDBselect25 which contains 1670 twilight zone proteins whose similarity is less than 25%. The second data base BF30 consists of 10294 proteins picked from PDB database with the similarity threshold of 30%. Structural classes of these proteins are taken from SCOP database. It is seen that there is a very poor correlation between historical criteria, and SCOP’s scientists’ intuition in classification of proteins into structural classes

    Where Do You Get Your Protein? Or: Biochemical Realization

    Get PDF
    Biochemical kinds such as proteins pose interesting problems for philosophers of science, as they can be studied from the points of view of both biology and chemistry. The relationship between the biological functions of biochemical kinds and the microstructures that they are related to is the key question. This leads us to a more general discussion about ontological reductionism, microstructuralism, and multiple realization at the biology-chemistry interface. On the face of it, biochemical kinds seem to pose a challenge for ontological reductionism and hence motivate a dual theory of chemical and biological kinds, a type of pluralism about natural kinds. But it will be argued that the challenge, which is based on multiple realization, can be addressed. The upshot is that there are reasonable prospects for ontological reductionism about biochemical kinds, which corroborates natural kind monism

    Three-dimensional Structure Databases of Biological Macromolecules

    Get PDF
    Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a)Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results.(b)Information-retrieval tools to allow searching to identify entries of interest and provide access to them.(c)Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction.(d)Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and “helpdesks,” as well as links to external software. Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains

    A Balanced Secondary Structure Predictor

    Get PDF
    Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets

    A Balanced Secondary Structure Predictor

    Get PDF
    Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets

    Learning biophysically-motivated parameters for alpha helix prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Our goal is to develop a state-of-the-art protein secondary structure predictor, with an intuitive and biophysically-motivated energy model. We treat structure prediction as an optimization problem, using parameterizable cost functions representing biological "pseudo-energies". Machine learning methods are applied to estimate the values of the parameters to correctly predict known protein structures.</p> <p>Results</p> <p>Focusing on the prediction of alpha helices in proteins, we show that a model with 302 parameters can achieve a Q<sub><it>α </it></sub>value of 77.6% and an SOV<sub><it>α </it></sub>value of 73.4%. Such performance numbers are among the best for techniques that do not rely on external databases (such as multiple sequence alignments). Further, it is easier to extract biological significance from a model with so few parameters.</p> <p>Conclusion</p> <p>The method presented shows promise for the prediction of protein secondary structure. Biophysically-motivated elementary free-energies can be learned using SVM techniques to construct an energy cost function whose predictive performance rivals state-of-the-art. This method is general and can be extended beyond the all-alpha case described here.</p

    Learning biophysically-motivated parameters for alpha helix prediction

    Get PDF
    Background: Our goal is to develop a state-of-the-art protein secondary structure predictor, with an intuitive and biophysically-motivated energy model. We treat structure prediction as an optimization problem, using parameterizable cost functions representing biological “pseudo-energies. ” Machine learning methods are applied to estimate the values of the parameters to correctly predict known protein structures. Results: Focusing on the prediction of alpha helices in proteins, we show that a model with 302 parameters can achieve a Qα value of 77.6 % and an SOVα value of 73.4%. Such performance numbers are among the best for techniques that do not rely on external databases (such as multiple sequence alignments). Further, it is easier to extract biological significance from a model with so few parameters. Conclusions: The method presented shows promise for the prediction of protein secondary structure. Biophysically-motivated elementary free-energies can be learned using SVM techniques to construct an energy cost function whose predictive performance rivals state-of-the-art. This method is general and can be extended beyond the all-alpha case described here. 1 Backgroun
    corecore