13,549 research outputs found

    A dynamic Bayesian network approach to protein secondary structure prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).</p> <p>Results</p> <p>In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better <it>Q</it><sub>3 </sub>accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus.</p> <p>Conclusion</p> <p>The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.</p

    Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight.</p> <p>Results</p> <p>In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements.</p> <p>Conclusions</p> <p>We present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at <url>http://noble.gs.washington.edu/proj/pssp</url>.</p

    Bayesian models and algorithms for protein beta-sheet prediction

    Get PDF
    Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy

    Assessing Protein Conformational Sampling Methods Based on Bivariate Lag-Distributions of Backbone Angles

    Get PDF
    Despite considerable progress in the past decades, protein structure prediction remains one of the major unsolved problems in computational biology. Angular-sampling-based methods have been extensively studied recently due to their ability to capture the continuous conformational space of protein structures. The literature has focused on using a variety of parametric models of the sequential dependencies between angle pairs along the protein chains. In this article, we present a thorough review of angular-sampling-based methods by assessing three main questions: What is the best distribution type to model the protein angles? What is a reasonable number of components in a mixture model that should be considered to accurately parameterize the joint distribution of the angles? and What is the order of the local sequence–structure dependency that should be considered by a prediction method? We assess the model fits for different methods using bivariate lag-distributions of the dihedral/planar angles. Moreover, the main information across the lags can be extracted using a technique called Lag singular value decomposition (LagSVD), which considers the joint distribution of the dihedral/planar angles over different lags using a nonparametric approach and monitors the behavior of the lag-distribution of the angles using singular value decomposition. As a result, we developed graphical tools and numerical measurements to compare and evaluate the performance of different model fits. Furthermore, we developed a web-tool (http://www.stat.tamu.edu/∼madoliat/LagSVD) that can be used to produce informative animations

    Bayesian models and algorithms for protein beta-sheet prediction

    Get PDF
    Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein's structure. Prediction of such components is vital to our understanding of the structure and function of a protein. In this paper, we address the problem of beta-sheet prediction. We introduce a Bayesian approach for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework. To select the optimum architecture, we analyze the space of possible conformations by efficient heuristics. Furthermore, we employ an algorithm that finds the optimum pairwise alignment between beta-strands using dynamic programming. Allowing any number of gaps in an alignment enables us to model beta-bulges more effectively. Though our main focus is proteins with six or less beta-strands, we are also able to perform predictions for proteins with more than six beta-strands by combining the predictions of BetaPro with the gapped alignment algorithm. We evaluated the accuracy of our method and BetaPro. We performed a 10-fold cross validation experiment on the BetaSheet916 set and we obtained significant improvements in the prediction accuracy

    Analysis of Three-Dimensional Protein Images

    Full text link
    A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the three-dimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as alpha-helices and beta-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction.Comment: See http://www.jair.org/ for any accompanying file
    corecore