Search CORE

14,347 research outputs found

The evaluation of protein folding rate constant is improved by predicting the folding kinetic order with a SVM-based method

Author: Capriotti Emidio
Casadio Rita
Publication venue
Publication date: 01/01/2006
Field of study

Protein folding is a problem of large interest since it concerns the mechanism by which the genetic information is translated into proteins with well defined three-dimensional (3D) structures and functions. Recently theoretical models have been developed to predict the protein folding rate considering the relationships of the process with tolopological parameters derived from the native (atomic-solved) protein structures. Previous works classified proteins in two different groups exhibiting either a single-exponential or a multi-exponential folding kinetics. It is well known that these two classes of proteins are related to different protein structural features. The increasing number of available experimental kinetic data allows the application to the problem of a machine learning approach, in order to predict the kinetic order of the folding process starting from the experimental data so far collected. This information can be used to improve the prediction of the folding rate. In this work first we describe a support vector machine-based method (SVM-KO) to predict for a given protein the kinetic order of the folding process. Using this method we can classify correctly 78% of the folding mechanisms over a set of 63 experimental data. Secondly we focus on the prediction of the logarithm of the folding rate. This value can be obtained as a linear regression task with a SVM-based method. In this paper we show that linear correlation of the predicted with experimental data can improve when the regression task is computed over two different sets, instead of one, each of them composed by the proteins with a correctly predicted two state or multistate kinetic order.Comment: The paper will be published on WSEAS Transaction on Biology and Biomedicin

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

Inductive Machine Learning of Structures: Estimating a Finite Element Optimisation Using Support Vector Machines

Author: Hanna S
Haroun Mahdavi S
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2006
Field of study

UCL Discovery

Inductive machine learning of optimal modular structures: Estimating solutions using support vector machines

Author: Hanna S.
Publication venue
Publication date: 19/09/2007
Field of study

Structural optimization is usually handled by iterative methods requiring repeated samples of a physics-based model, but this process can be computationally demanding. Given a set of previously optimized structures of the same topology, this paper uses inductive learning to replace this optimization process entirely by deriving a function that directly maps any given load to an optimal geometry. A support vector machine is trained to determine the optimal geometry of individual modules of a space frame structure given a specified load condition. Structures produced by learning are compared against those found by a standard gradient descent optimization, both as individual modules and then as a composite structure. The primary motivation for this is speed, and results show the process is highly efficient for cases in which similar optimizations must be performed repeatedly. The function learned by the algorithm can approximate the result of optimization very closely after sufficient training, and has also been found effective at generalizing the underlying optima to produce structures that perform better than those found by standard iterative methods

UCL Discovery

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction

Author: Iqbal Sumaiya
Publication venue: ScholarWorks@UNO
Publication date: 09/08/2017
Field of study

Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools

University of New Orleans