14,347 research outputs found
The evaluation of protein folding rate constant is improved by predicting the folding kinetic order with a SVM-based method
Protein folding is a problem of large interest since it concerns the
mechanism by which the genetic information is translated into proteins with
well defined three-dimensional (3D) structures and functions. Recently
theoretical models have been developed to predict the protein folding rate
considering the relationships of the process with tolopological parameters
derived from the native (atomic-solved) protein structures. Previous works
classified proteins in two different groups exhibiting either a
single-exponential or a multi-exponential folding kinetics. It is well known
that these two classes of proteins are related to different protein structural
features. The increasing number of available experimental kinetic data allows
the application to the problem of a machine learning approach, in order to
predict the kinetic order of the folding process starting from the experimental
data so far collected. This information can be used to improve the prediction
of the folding rate. In this work first we describe a support vector
machine-based method (SVM-KO) to predict for a given protein the kinetic order
of the folding process. Using this method we can classify correctly 78% of the
folding mechanisms over a set of 63 experimental data. Secondly we focus on the
prediction of the logarithm of the folding rate. This value can be obtained as
a linear regression task with a SVM-based method. In this paper we show that
linear correlation of the predicted with experimental data can improve when the
regression task is computed over two different sets, instead of one, each of
them composed by the proteins with a correctly predicted two state or
multistate kinetic order.Comment: The paper will be published on WSEAS Transaction on Biology and
Biomedicin
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Inductive machine learning of optimal modular structures: Estimating solutions using support vector machines
Structural optimization is usually handled by iterative methods requiring repeated samples of a physics-based model, but this process can be computationally demanding. Given a set of previously optimized structures of the same topology, this paper uses inductive learning to replace this optimization process entirely by deriving a function that directly maps any given load to an optimal geometry. A support vector machine is trained to determine the optimal geometry of individual modules of a space frame structure given a specified load condition. Structures produced by learning are compared against those found by a standard gradient descent optimization, both as individual modules and then as a composite structure. The primary motivation for this is speed, and results show the process is highly efficient for cases in which similar optimizations must be performed repeatedly. The function learned by the algorithm can approximate the result of optimization very closely after sufficient training, and has also been found effective at generalizing the underlying optima to produce structures that perform better than those found by standard iterative methods
Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction
Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data.
In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools
- …