Search CORE

11,551 research outputs found

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

Author: Cang Zixuan
Wei Guo-Wei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/03/2017
Field of study

Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Directory of Open Access Journals

RosettaBackrub--a web server for flexible backbone protein structure modeling and design.

Author: Friedland Gregory F
Humphris Elisabeth L
Kortemme Tanja
Lauck Florian
Smith Colin A
Publication venue: eScholarship, University of California
Publication date: 12/05/2010
Field of study

The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein-protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design

PubMed Central

eScholarship - University of California

PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes

Author: Fang Jianwen
Li Yunqi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799,0.782, 0.787, and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, respectively. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on physical principles can be highly useful for testing the robustness of predictive models

CiteSeerX

Public Library of Science (PLOS)

KU ScholarWorks

Directory of Open Access Journals

PubMed Central

FigShare

Prots: A fragment based protein thermo‐stability potential

Author: Altschul
Bae
Becktel
Berezovsky
Berezovsky
Brych
Capriotti
Chan
Cheng
Chennamsetty
Culajay
Dahiyat
Dalluge
Deutsch
Finn
Frokjaer
Gianese
Glyakina
Gromiha
Gromiha
Gu
Gu
Guerois
Haney
Huang
Kabsch
Kim
Korkegian
Kryshtafovych
Kumar
Lazar
Li
Li
Li
Li
Liang
Lippow
Mandrich
Masso
Masso
McDonald
Menendez-Arias
Metpally
Montanucci
Moult
Pokala
Potapov
Razvi
Schoemaker
Schweiker
Singh
Sterner
Unsworth
Wu
Wu
Wu
Yin
Zeldovich
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhou
Zhou
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Designing proteins with enhanced thermo‐stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo‐stable proteins are in critical demand. Here we report PROTS, a sequential and structural four‐residue fragment based protein thermo‐stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo‐stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo‐stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white‐box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level. Proteins 2012; © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/89526/1/23163_ftp.pd

Crossref

KU ScholarWorks

PubMed Central

Deep Blue Documents at the University of Michigan

Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine

Author: Birolo G.
Capriotti E.
Fariselli P.
Montanucci L.
Sanavia T.
Turina P.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Institutional Research Information System University of Turin