Search CORE

14,859 research outputs found

Protein Secondary Structure Prediction Using Support Vector Machines, Nueral Networks and Genetic Algorithms

Author: Reyaz-Ahmed Anjum B
Publication venue: ScholarWorks @ Georgia State University
Publication date: 03/05/2007
Field of study

Bioinformatics techniques to protein secondary structure prediction mostly depend on the information available in amino acid sequence. Support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction. In this study, a new sliding window scheme is introduced with multiple windows to form the protein data for training and testing SVM. Orthogonal encoding scheme coupled with BLOSUM62 matrix is used to make the prediction. First the prediction of binary classifiers using multiple windows is compared with single window scheme, the results shows single window not to be good in all cases. Two new classifiers are introduced for effective tertiary classification. This new classifiers use neural networks and genetic algorithms to optimize the accuracy of the tertiary classifier. The accuracy level of the new architectures are determined and compared with other studies. The tertiary architecture is better than most available techniques

ScholarWorks @ Georgia State University

A methodology in predicting protein tertiary structure.

Author
Publication venue: Department of Cultural and Religious Studies, The Chinese University of Hong Kong
Publication date: 01/01/1993
Field of study

by Li Leung Wah.Thesis (M.Phil.)--Chinese University of Hong Kong, 1993.Includes bibliographical references (leaves 76-81).AcknowledgementsAbstractChapter 1. --- Protein modeling --- p.1Chapter 1.1 --- Genetic Engineering --- p.1Chapter 1.2 --- Protein Engineering --- p.2Chapter 1.2.1 --- The basic concept --- p.2Chapter 1.2.2 --- The importance of protein modeling --- p.3Chapter 1.2.3 --- Applications --- p.4Chapter 1.2.3.1 --- Industry --- p.4Chapter 1.2.3.2 --- Medicine --- p.4Chapter 1.3 --- The structure of protein molecule --- p.5Chapter 2. --- About this thesis --- p.8Chapter 2.1 --- Methods on protein tertiary structure prediction --- p.8Chapter 2.1.1 --- Energy minimization method --- p.9Chapter 2.1.2 --- Sequence homology method --- p.9Chapter 2.1.3 --- Hierarchical assembly method --- p.11Chapter 2.2 --- Artificial Intelligence and molecular modeling --- p.11Chapter 2.3 --- Computer graphics and molecule display --- p.13Chapter 2.3.1 --- Molecular model in computer graphics --- p.13Chapter 2.3.2 --- Interactive graphic operations --- p.16Chapter 2.4 --- The objective of this thesis --- p.17Chapter 3. --- Algorithms for protein secondary structure prediction --- p.20Chapter 3.1 --- Hydrophobicity --- p.20Chapter 3.2 --- Algorithms for protein secondary structure prediction --- p.22Chapter 3.2.1 --- The Chou and Fasman method --- p.23Chapter 3.2.1.1 --- Method --- p.24Chapter 3.2.1.2 --- Results --- p.25Chapter 3.2.2 --- The GOR method --- p.26Chapter 3.2.2.1 --- Theory --- p.26Chapter 3.2.2.2 --- Method and results --- p.26Chapter 3.3 --- A proposed algorithm --- p.28Chapter 3.3.1 --- Procedure of our algorithm --- p.30Chapter 4. --- A protein tertiary structure prediction method --- p.31Chapter 4.1 --- The linkage between two amino acids --- p.32Chapter 4.2 --- Rotation angle between two peptide planes --- p.34Chapter 4.2.1 --- Helical structure --- p.35Chapter 4.2.1.1 --- Concept --- p.35Chapter 4.2.1.2 --- Procedure --- p.36Chapter 4.2.2 --- Sheet structure --- p.37Chapter 4.2.3 --- Turn structure --- p.38Chapter 4.2.4 --- Anti-parallel sheet and turn structure --- p.40Chapter 4.3 --- Random factor in rotation angle of peptide planes --- p.41Chapter 4.4 --- Atomic size --- p.41Chapter 4.5 --- Tertiary structure prediction algorithm --- p.42Chapter 5. --- Implementation --- p.45Chapter 5.1 --- Hardware --- p.45Chapter 5.2 --- User-defined data types and data structures --- p.46Chapter 5.3 --- Technique in molecule displaying --- p.48Chapter 5.4 --- Image processing --- p.50Chapter 5.5 --- Options in our program --- p.52Chapter 5.6 --- Steps in protein tertiary structure prediction --- p.54Chapter 6. --- Results --- p.59Chapter 6.1 --- The results of protein secondary structure prediction --- p.59Chapter 6.2 --- The results of protein tertiary structure prediction --- p.66Chapter 7. --- Conclusion --- p.70Chapter 7.1 --- Comments on protein secondary structure prediction algorithm --- p.70Chapter 7.1.1 --- Advantages and disadvantages --- p.70Chapter 7.1.2 --- Further development --- p.71Chapter 7.2 --- Discussion on X-ray crystallographic data --- p.72Chapter 7.3 --- Comments on the protein tertiary structure prediction algorithm --- p.73Chapter 7.3.1 --- Advantages and disadvantages --- p.73Chapter 7.3.2 --- Further development --- p.74Chapter 7.3.2.1 --- Rotation angle between two peptide planes --- p.74Reference --- p.76Glossary --- p.82Appendix A An algorithm to determine hydrophobic value --- p.83Appendix B Chou and Fasman algorithm --- p.84Appendix C GOR algorithm --- p.87Appendix D Shading algorithm --- p.8

CUHK Digital Repository

Sequential Monte Carlo Methods for Protein Folding

Author: Grassberger Peter
Publication venue
Publication date: 01/01/2004
Field of study

We describe a class of growth algorithms for finding low energy states of heteropolymers. These polymers form toy models for proteins, and the hope is that similar methods will ultimately be useful for finding native states of real proteins from heuristic or a priori determined force fields. These algorithms share with standard Markov chain Monte Carlo methods that they generate Gibbs-Boltzmann distributions, but they are not based on the strategy that this distribution is obtained as stationary state of a suitably constructed Markov chain. Rather, they are based on growing the polymer by successively adding individual particles, guiding the growth towards configurations with lower energies, and using "population control" to eliminate bad configurations and increase the number of "good ones". This is not done via a breadth-first implementation as in genetic algorithms, but depth-first via recursive backtracking. As seen from various benchmark tests, the resulting algorithms are extremely efficient for lattice models, and are still competitive with other methods for simple off-lattice models.Comment: 10 pages; published in NIC Symposium 2004, eds. D. Wolf et al. (NIC, Juelich, 2004

arXiv.org e-Print Archive

CiteSeerX

Juelich Shared Electronic Resources

Utilizing Protein Structure to Identify Non-Random Somatic Mutations

Author: Cheng Yuwei
Cheung Kei-Hoi
Modis Yorgo
Ryslik Gregory
Zhao Hongyu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/02/2013
Field of study

Motivation: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key "driver" mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose a new methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering. Results: We have developed a novel algorithm, iPAC: identification of Protein Amino acid Clustering, for the identification of non-random somatic mutations in proteins that takes into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KCa. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology

arXiv.org e-Print Archive

Springer - Publisher Connector