21,099 research outputs found
Jeeva: Enterprise Grid-enabled Web Portal for Protein Secondary Structure Prediction
This paper presents a Grid portal for protein secondary structure prediction
developed by using services of Aneka, a .NET-based enterprise Grid technology.
The portal is used by research scientists to discover new prediction structures
in a parallel manner. An SVM (Support Vector Machine)-based prediction
algorithm is used with 64 sample protein sequences as a case study to
demonstrate the potential of enterprise Grids.Comment: 7 page
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks
Prediction of one-dimensional protein structures such as secondary structures
and contact numbers is useful for the three-dimensional structure prediction
and important for the understanding of sequence-structure relationship. Here we
present a new machine-learning method, critical random networks (CRNs), for
predicting one-dimensional structures, and apply it, with position-specific
scoring matrices, to the prediction of secondary structures (SS), contact
numbers (CN), and residue-wise contact orders (RWCO). The present method
achieves, on average, accuracy of 77.8% for SS, correlation coefficients
of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS
prediction is comparable to other state-of-the-art methods, and that of the CN
prediction is a significant improvement over previous methods. We give a
detailed formulation of critical random networks-based prediction scheme, and
examine the context-dependence of prediction accuracies. In order to study the
nonlinear and multi-body effects, we compare the CRNs-based method with a
purely linear method based on position-specific scoring matrices. Although not
superior to the CRNs-based method, the surprisingly good accuracy achieved by
the linear method highlights the difficulty in extracting structural features
of higher order from amino acid sequence beyond that provided by the
position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for
publication in BIOPHYSIC
The evaluation of protein folding rate constant is improved by predicting the folding kinetic order with a SVM-based method
Protein folding is a problem of large interest since it concerns the
mechanism by which the genetic information is translated into proteins with
well defined three-dimensional (3D) structures and functions. Recently
theoretical models have been developed to predict the protein folding rate
considering the relationships of the process with tolopological parameters
derived from the native (atomic-solved) protein structures. Previous works
classified proteins in two different groups exhibiting either a
single-exponential or a multi-exponential folding kinetics. It is well known
that these two classes of proteins are related to different protein structural
features. The increasing number of available experimental kinetic data allows
the application to the problem of a machine learning approach, in order to
predict the kinetic order of the folding process starting from the experimental
data so far collected. This information can be used to improve the prediction
of the folding rate. In this work first we describe a support vector
machine-based method (SVM-KO) to predict for a given protein the kinetic order
of the folding process. Using this method we can classify correctly 78% of the
folding mechanisms over a set of 63 experimental data. Secondly we focus on the
prediction of the logarithm of the folding rate. This value can be obtained as
a linear regression task with a SVM-based method. In this paper we show that
linear correlation of the predicted with experimental data can improve when the
regression task is computed over two different sets, instead of one, each of
them composed by the proteins with a correctly predicted two state or
multistate kinetic order.Comment: The paper will be published on WSEAS Transaction on Biology and
Biomedicin
Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle
The potent immunomodulatory, anti-inflammatory and procoagulant properties of the
protein no. 4 secreted from the rat seminal vesicle epithelium (SV-IV) have been
previously found to be modulated by a supramolecular monomer-trimer equilibrium.
More structural details that integrate experimental data into a predictive framework
have recently been reported. Unfortunately, homology modelling and fold-recognition
strategies were not successful in creating a theoretical model of the structural
organization of SV-IV. It was inferred that the global structure of SV-IV is not similar
to any protein of known three-dimensional structure. Reversing the classical approach
to the sequence-structure-function paradigm, in this paper we report on novel
information obtained by comparing physicochemical parameters of SV-IV with two
datasets made of intrinsically unfolded and ideally globular proteins. In addition, we
have analysed the SV-IV sequence by several publicly available disorder-oriented
predictors. Overall, disorder predictions and a re-examination of existing experimental
data strongly suggest that SV-IV needs large plasticity to efficiently interact with the
different targets that characterize its multifaceted biological function and should be
therefore better classified as an intrinsically disordered protein
- …