21,099 research outputs found

    Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction

    Get PDF
    This paper presents a Grid portal for protein secondary structure prediction developed by using services of Aneka, a .NET-based enterprise Grid technology. The portal is used by research scientists to discover new prediction structures in a parallel manner. An SVM (Support Vector Machine)-based prediction algorithm is used with 64 sample protein sequences as a case study to demonstrate the potential of enterprise Grids.Comment: 7 page

    Machine learning-guided directed evolution for protein engineering

    Get PDF
    Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks

    Full text link
    Prediction of one-dimensional protein structures such as secondary structures and contact numbers is useful for the three-dimensional structure prediction and important for the understanding of sequence-structure relationship. Here we present a new machine-learning method, critical random networks (CRNs), for predicting one-dimensional structures, and apply it, with position-specific scoring matrices, to the prediction of secondary structures (SS), contact numbers (CN), and residue-wise contact orders (RWCO). The present method achieves, on average, Q3Q_3 accuracy of 77.8% for SS, correlation coefficients of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS prediction is comparable to other state-of-the-art methods, and that of the CN prediction is a significant improvement over previous methods. We give a detailed formulation of critical random networks-based prediction scheme, and examine the context-dependence of prediction accuracies. In order to study the nonlinear and multi-body effects, we compare the CRNs-based method with a purely linear method based on position-specific scoring matrices. Although not superior to the CRNs-based method, the surprisingly good accuracy achieved by the linear method highlights the difficulty in extracting structural features of higher order from amino acid sequence beyond that provided by the position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for publication in BIOPHYSIC

    The evaluation of protein folding rate constant is improved by predicting the folding kinetic order with a SVM-based method

    Full text link
    Protein folding is a problem of large interest since it concerns the mechanism by which the genetic information is translated into proteins with well defined three-dimensional (3D) structures and functions. Recently theoretical models have been developed to predict the protein folding rate considering the relationships of the process with tolopological parameters derived from the native (atomic-solved) protein structures. Previous works classified proteins in two different groups exhibiting either a single-exponential or a multi-exponential folding kinetics. It is well known that these two classes of proteins are related to different protein structural features. The increasing number of available experimental kinetic data allows the application to the problem of a machine learning approach, in order to predict the kinetic order of the folding process starting from the experimental data so far collected. This information can be used to improve the prediction of the folding rate. In this work first we describe a support vector machine-based method (SVM-KO) to predict for a given protein the kinetic order of the folding process. Using this method we can classify correctly 78% of the folding mechanisms over a set of 63 experimental data. Secondly we focus on the prediction of the logarithm of the folding rate. This value can be obtained as a linear regression task with a SVM-based method. In this paper we show that linear correlation of the predicted with experimental data can improve when the regression task is computed over two different sets, instead of one, each of them composed by the proteins with a correctly predicted two state or multistate kinetic order.Comment: The paper will be published on WSEAS Transaction on Biology and Biomedicin

    Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle

    Get PDF
    The potent immunomodulatory, anti-inflammatory and procoagulant properties of the
protein no. 4 secreted from the rat seminal vesicle epithelium (SV-IV) have been
previously found to be modulated by a supramolecular monomer-trimer equilibrium.
More structural details that integrate experimental data into a predictive framework
have recently been reported. Unfortunately, homology modelling and fold-recognition
strategies were not successful in creating a theoretical model of the structural
organization of SV-IV. It was inferred that the global structure of SV-IV is not similar
to any protein of known three-dimensional structure. Reversing the classical approach
to the sequence-structure-function paradigm, in this paper we report on novel
information obtained by comparing physicochemical parameters of SV-IV with two
datasets made of intrinsically unfolded and ideally globular proteins. In addition, we
have analysed the SV-IV sequence by several publicly available disorder-oriented
predictors. Overall, disorder predictions and a re-examination of existing experimental
data strongly suggest that SV-IV needs large plasticity to efficiently interact with the
different targets that characterize its multifaceted biological function and should be
therefore better classified as an intrinsically disordered protein
    • …
    corecore